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Huari Quechua morphological glosses 


suffix gloss' meaning 

- 1 first person (preceding vowel is lengthened) 

-shaa - -shaq 1.FUT first person, future tense 

-ma 1.0BJ first person object 

-ntsi ~ -ntsik 1.PL.INCL first person inclusive, plural 

-shun 1.PL.INCL.FUT first person inclusive plural, future tense 

-shayki 1.SUB>2.0BJ.FUT first person subject acts on second person object, future 
tense 

-nki ~ -yki 2 second person 

-n 3 third person 

-nqa 3.FUT first person, future tense 

-shunki 3.SBJ>2.0BJ third person subject acts on second person object 

-shu ... -nki 3.SBJ>2.0BJ third person subject acts on second person object, past 
tense (in verbal constructions in the past tense, the 
sequence -shu-nki can be split up, e.g. mutsa-shu-rqa-nki 
kiss-3.SBJ>2.0BJ-PST-3.SBJ>2.0BJ ‘s/he kissed you’) 

-piq ~ -pita ABL ablative (spatial and temporal) 

-pis ~ -si ADD additive (‘also’, ‘even’) 

-q AG agentive 

-m - -mi / -chaa ASS assertion 

-sapa AUGM augmentative 

-paa - -paq BEN benefactive (nominal) 

-pu BEN benefactive (verbal) 

-tsi / -raykur CAUS causative 

-mu CIS cislocative 

-chi CONJ conjectural 


1 These are the Quechua glosses developed in the DFG-funded project *Zweisprachige Prosodie: 
Metrik, Rhythmus und Intonation in mehrsprachigen Kontaktsituationen* (PI Uli Reich, Freie Uni- 
versität Berlin). See https://www.geisteswissenschaften.fu-berlin.de/we05/institut/mitarbeiter/reich/ 
forschung/DFG-projekt-zweisprachige-Prosodie/Manuals/Glossen HP 20190626.pdf and section 2.1. 
“~” indicates variation between phonologically similar forms, */" indicates variation in form but with 
the same meaning as far as could be determined. If two forms have the same gloss but are listed in 
separate lines, their meaning is similar enough to warrant the same gloss, but not enough is known 
about the semantics of (one of) the forms in question to assign two different glosses. 
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(continued) 


suffix gloss meaning 

-tsun - -tsuraa CONJ conjectural 

-raq ~ -raa ~ -ran ~ -ra | CONT continuative (‘still’) 

-man DEST destination, goal 

-taa ~ -taq ~ -tan DETVAR variable determination (“contrastive”) 

-rku ~ -rka DIR directional (kawallu-man lluqa-rku-n ‘s/he gets on the 
horse’) 

-rpu ~ -rpa DIR downwards directional (hita-rpu-n ‘s/he throws it down- 
wards’) 

-yku ~ -yka DIR inwards directional (chura-yku-n ‘s/he puts (it) inside’) 

-rqu ~ -rqa DIR outwards directional (aywa-rqu-n ‘he goes out’) 

-na DISC discontinuative (‘now’, ‘already’) 

-ni FON euphonic (prevents segmental sequences prohibited by 
phonotactics) 

-pa GEN genitive 

-y INF infinitive (used also in imperatives) 

-wan INST instrumental-comitative 

-ntin INTEG integrative 

-man IRR irrealis (conditional) 

-shwan IRR.1PL irrealis (conditional), first person plural 

-pa / -ri / -ska ~ -ski ITER iterative 

-lla LIM limitative (used also as diminutive) 

-chaw LOC locative 

-ku ~ -ka / -kaa MID middle voice 

-taaku NEG negator 

-tsu NEG negator 

-na / -nqa / NMLZ nominalizer 

-ta OBJ object 

-kuna PL plural (nominal) 

-yaa ~ -ya PL plural (verbal) 

-yuq POSS possessive (qillay-yuq money-POSS ‘someone who has 
money’) 

-yka PROG progressive aspect 


(continued) 
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suffix gloss meaning 

-naa - -naq PST.REP reportative past tense 

-rqa ~ -ra ~-rqu PST past tense 

-q PST.HAB habitual past 

-shqa ~ -shaa ~-sh PRTCP past participle 

-ku Q polar question 

-naku RECP reciprocal (mutsa-naku-yaa-n ‘they kiss each other" 

-sh ~ -shi REP reportative 

-naw SIMIL similarity 

-pti SUBDIFF subordinator, different subjects in matrix and subord. 
clause 

-shpa / -r SUBID subordinator, coreferential subjects in matrix and 
subord. clause 

-naaqa SUPL superlative degree comparison 

-yaq TERM terminative (‘until’, ‘up to’) 

-qa TOP topic (focal) 

-paku / -ykacha UNSPEC non-specific reference 

-ya VBLZ.INTR verbalizer (intransitive) 

-tsaa ~ -tsa VBLZ.TR verbalizer (transitive) 

stem gloss meaning 

ka COP copula 

tsay / washa DEM.DIST distal demonstrative 

kay DEM.PROX proximal demonstrative 

mana NEG negator (particle) 

na PSSP passe-partout morpheme (placeholder, stand-in for any 


stem) 


Glosses from other languages cited in this work 


affix gloss? meaning language 

-no GEN genitive Japanese 

go- HON honorific Japanese 

-ga NOM nominative Japanese 

-wa TOP topic Japanese 

-ari DAT.SG dative singular Northern Bizcayan Basque 
'-ari DAT.PL dative plural (preaccenting) Northern Bizcayan Basque 
-an GEN.SG genitive singular Northern Bizcayan Basque 
-an GEN.PL genitive plural (preaccenting Northern Bizcayan Basque 
-I ACC accusative Turkish 

-ya DAT dative Turkish 

'-ymis EVID evidential (preaccenting) Turkish 

-Tyor IMPERF imperfective Turkish 

-lar PL plural Turkish 

-an REL relativizer Turkish 


2 For the origin of these glosses, see references in the text. 
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1 Introduction 


This thesis describes the intonation of two varieties of Spanish and Quechua that 
are spoken by the same bilingual speaker community in Huari, Peru. Spanish and 
Quechua are generally seen to be typologically quite different and genetically unre- 
lated, but in large parts of Latin America they have had a common history of being 
spoken by the same people, in some cases for centuries. In Huari, both Quechua and 
Spanish are spoken by nearly everyone and used daily in interaction. This means 
that for a lot of daily communicative functions, Huari speakers have (at least) two 
ways of putting them into language at their disposal. The thesis (and the research 
project out of which it has emerged) takes as its basis the assumption that using 
equivalent communicative interactions as a third of comparison provides a unique 
opportunity for studying how one way of putting things into language differs from 
another. This allows us to see what kind of constraints are specific to only one of 
those ways (linguistic variants or systems), and which are perhaps shared. Doing 
this across different types of communicative situations even allows us to see what 
kinds of interactive functions (meanings) are encoded in what way in these systems. 

Intonation is particularly interesting in this regard, since in many languages 
including Spanish it has been found to convey a variety of interactional meanings 
that have to do with how a discourse progresses, how knowledge is negotiated by 
its participants and what their subjective takes on it are. For Quechua on the other 
hand, many similar meanings have been found to be encoded via morphology, 
yet the question whether this means that intonation/prosody plays no role for the 
expression of such meanings in Quechua has mostly not been addressed. In order 
to be able to say something about what kinds of meaning the prosody of a language 
encodes, its prosodic system first has to be described. Yet for this, it is important to 
know what meanings there are present in the interaction. This last part is luckily 
broadly dealt with via the general methodology of comparing the same kinds of 
interactions across both languages. Thus this thesis proposes to describe the intona- 
tional systems of Huari Quechua and Huari Spanish (both previously undescribed) 
from observing their uses by the same speakers in similar interactions, and in this 
way to learn something about how the interactional meanings that can be found 
to play a role are encoded. This fills a research gap in that no detailed intonational 
and prosodic descriptions exist of varieties of Quechua or Spanish that are geo- 
graphically or typologically close to those of Huari, and almost no studies that relate 
prosody to discourse meaning in a pair of varieties of these languages spoken in the 
same speaker community. 
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2 —— 1 Introduction 


Therefore the leading questions, subject to further expansion and refinement 
(in chapter 4), are: 


(1) ` What are the relevant properties of the intonational systems of Huari Spanish 
and Huari Quechua? 


(2 How and what kinds of interactional/discourse meaning do they encode? 


(3 Which of these properties are specific to one language, and which are perhaps 
common to both? 


The third question has a subquestion that will resurface from time to time through- 
out the thesis, namely whether certain kinds of intonational systems lend them- 
selves better to the encoding of meaning than others. 

To answer these questions, the thesis is structured as follows. Chapter 2 gives 
a short introduction to Huari and the general method. Chapter 3 provides a theo- 
retical background for the analytical chapters, introducing the models of intona- 
tional and pragmatic analysis used and their assumptions, as well as setting out 
the dimensions along which intonational systems have crosslinguistically been 
shown to vary. This is done because the prosody of Quechua is largely undescribed 
and some of the existing descriptions are potentially influenced by the perceptual 
filters of linguists reared on European languages. Since I am also a European lin- 
guist (and many of my readers will be, too), I try to work against this bias by initially 
mapping out what options are known to be possible and thus to expand the expec- 
tational horizon. Based on the theoretical discussion in chapter 3, chapter 4 then 
provides further refined research questions. Answers to them are then sought in 
the analytical chapters on Huari Spanish (chapter 5) and Huari Quechua (chapter 
6) intonation. Finally, chapter 7 first summarizes the results from the analytical 
chapters and then concludes the thesis on a discussion of how some of the results 
from the two languages can be brought together. 

Before heading into the fray, I should address the elephant in the room. This 
thesis is not about language contact as such, even though its object are two genet- 
ically unrelated language varieties spoken by the same bilingual speakers. I sub- 
scribe to the view that language contact is ubiquitous even in speaker communi- 
ties traditionally considered monolingual, and thus a fundamental component of 
variation, which the thesis is very much about (cf. Mufwene 2001; Enfield 2014; 
Otheguy et al. 2015; Hóder 2018). I will however make no claim about the origin of 
variable features that turn out to be shared between the two languages. It seems to 
me that the logic of assuming that a feature observed in Huari Spanish and Huari 
Quechua *comes from" Quechua *into" Spanish because it is not known from other 
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descriptions of Spanish is faulty in the circumstances of this project. It might well 
have been in existence in Huari Spanish for several generations, in which case it is 
simply part of Spanish for the speakers, or actually part of their repertoire across 
both languages. It might also have *come" from an as yet undescribed neighbour- 
ing variety of Spanish. Language contact only really becomes a meaningful concept 
once larger populations are considered diachronically. This thesis is a very local- 
ized and synchronic endeavour. My focus is on describing features of the prosodic 
systems in both languages and then comparing them, plausibilizing what can be 
thought of as common and convergent, or specific and divergent, in this speaker 
community. In any case, a detailed and empirically grounded description of what 
multilingual speakers can be seen to actually do at a certain point in time and space 
may serve as a data point for future studies that are interested in the possibility of 
the propagation of features across longer spatial and temporal distances. 


2 Data & general methods 


This chapter provides an overview over methods and procedures used in this 
thesis, both for data collection and analysis. It also serves to situate the data in a 
sociolinguistic context via a brief description of the region in the Peruvian Andes 
they come from, and some sociodemographic key characteristics of their bilingual 
speakers. 


2.1 Data collection 


The speech data analyzed in this work was collected by Raúl Bendezü Araujo, Uli 
Reich, and the author during two field trips to the central Andean town of Huari in 
Ancash, Peru in August — October, 2015, and April - June, 2017. The field trips were 
part of a research project funded by the Deutsche Forschungsgemeinschaft (DFG) 
with the title Zweisprachige Prosodie: Metrik, Rhythmus und Intonation zwischen 
Spanisch und Quechua (“Bilingual prosody: meter, rhythm, and intonation between 
Spanish and Quechua”, PI Uli Reich).? Potential participants for the recordings were 
contacted via a friend-of-a-friend-approach, through initial contact with Leonel 
Menacho López in Huaraz and Gabriel Barreto Echiparra in Huari, both of whom 
acted also as native speaker consultants for the production of the experimental 
materials and in general. Their contributions were invaluable to the success of the 
research project as a whole. Speakers participated of their own free will, and gave 
written consensus to be recorded and have their data published anonymously. 
They were remunerated financially for their time. Recordings took place in various 
localities in Huari. Efforts were made to record in silent, closed environments and 
to reduce background noise during recordings as much as possible. However, the 
surrounding soundscape of a rural Andean town could not always be excluded. 
The data used in this thesis comes from those recordings that were the best both in 
terms of recording quality and naturalness of production, i.e. from speakers that 
spoke freely and comfortably, and enjoyed participating in the tasks. Recordings 
were made using a mounted Røde NT-1A condenser microphone and a Marantz 


3 The DFG-project number is 274614727. Its website is at: https://www.geisteswissenschaften.fu- 
berlin.de/we05/institut/mitarbeiter/reich/forschung/DFG-projekt-zweisprachige-Prosodie/index. 
html. It contains links to those data that are already available online, descriptions of the experimen- 
tal tasks, and a full list of contributors to the project. The project extends the same elicitation meth- 
ods to other language pairs in Latin America, including Guaraní-Spanish in Asunción (Paraguay) 
and Nheengatü-Portuguese in Saó Gabriel da Cachoeira (Brazil), for which data is in the process of 
being published on the website. 
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PMD 670 audio recorder (in 2015) or a Zoom HAN audio recorder (in 2017) in 44.1 
KHz (BPM). The Quechua data recorded in 2015 were transcribed and translated 
by bilingual students and Leonel Menacho at the Universidad Nacional Santiago 
Antünez de Mayolo (UNASAM) in Huaraz, and their morphology annotated by stu- 
dents at the Pontificia Universidad Católica del Perü (PUCP) in Lima and Leonel 
Menacho. They were paid for their work. Transcriptions and annotations were 
checked for consistency by Raúl Bendezt Araujo and the author, in accordance with 
the glossing conventions devised by them, Uli Reich, and Leonel Menacho, based 
on Parker (1976) and Max Planck Institute for Evolutionary Anthropology (2015). 
The full list of glosses can be found at the beginning on this thesis. They are also the 
glosses used in each Huari Quechua example in this study. The Spanish data were 
transcribed and translated by students at Freie Universitat Berlin, Raúl Bendezu 
Araujo, and the author. 


2.2 Brief description of Huari and its linguistic situation 


Huari : EJ 


P Lima 


Figure 1: Map of Huari within Ancash (left), and within Peru (right). @ OpenStreetMap. 


Huari is a small town of about 5000 inhabitants, the capital of the province of 
Huari in the department of Ancash, whose capital is Huaraz. It lies on the eastern 
slopes of the cordillera blanca range of the central Andes in the Conchucos region 
(cf. Figure 1). Until 2016, the main road into Conchucos was not fully tarmacked 
(Wikipedia 2021b), limiting access to the region. The area is one of the regions in 
Peru with the highest share of the population giving Quechua as their first language 
(cf. the left map in Figure 2). In Huari province, 72.796 of the population stated in 
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Share of population 
with Quechua as first 
language, by province. 


Figure 2: Maps of the share of the population that names Quechua as first language per province 
in central and southern Peru (left)* and of the distribution of the Quechuan languages in the same 
area (right, from Adelaar & Muysken 2004: 184). 


2017 that Quechua is the language in which they learned to speak, while 24.7% said 
that it is Spanish; out of a total population of about 60.000 (Instituto Nacional de 
Estadística e Informática 2018). The variety of Quechua spoken in Huari belongs to 
the branch of Quechuan languages? that is variously called Quechua I, Quechua B, 
or central Quechua. I will refer to it as Quechua I when necessary. Quechua I vari- 
eties are spoken in the central Peruvian Andes, mainly in the mountainous parts 
of the departments Ancash, Huánuco, Junín, Pasco, and Lima (Adelaar & Muysken 
2004: 185, cf. also the map on the right in Figure 2). The other main branch of 
Quechuan is called Quechua II or A, and its varieties are spoken in southern Peru 
including Cuzco, Bolivia, and a small enclave in Argentina, as well as in Ecuador, 
areas of Amazonian Colombia and parts of northern Peru. The Quechua I branch 
has a far greater internal diversity on a much smaller area, which is why the 
area where its varieties are spoken has been suggested as the Quechua homeland 
(Adelaar & Muysken 2004: 181). The Quechua varieties in Conchucos both differ 
from those spoken in Huaylas on the other side of the cordillera blanca and show a 
considerable internal (phonological) diversity (Parker 1976: 28). Diversification is 
plausibly due to the mountainous terrain facilitating the maintenance of very local 


4 Map adapted from Wikipedia (2021a) based on data from Instituto Nacional de Estadística e 
Informática (2018). 

5 Allin all, 8-10 million people are estimated to speak languages belonging to the Quechuan family 
in an area extending from Putumayo in Southern Colombia throughout Ecuador and Peru and into 
parts of Bolivia, with small numbers of speakers also in Argentina and Chile, and internal diversity 
in the family comparable to that in the Romance or Slavic families (Adelaar & Muysken 2004: 168). 
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linguistic traditions. In the northern part of the Conchucos region, Sihuas Quechua 
and Corongo Quechua have been claimed and partially described as separate vari- 
eties (Parker 1976: 28; Hintz 2000; Hintz & Hintz 2017), while the Quechua spoken 
in the southern part including Huari province has been called South Conchucos 
Quechua but also described with some further internal variation (Hintz 2006, 2007; 
Hintz 2014; Hintz & Hintz 2017). For practical purposes, I will call the data sample 
analyzed in this work simply Huari Quechua and Huari Spanish by virtue of it 
having been recorded in Huari by speakers living in Huari town. The analysis itself 
will show how much variation still exists in this sample,“ but I will remain agnostic 
about how much difference there might have to be in order to say that it consists 
of separate varieties or whether this is even a useful measure. Parker (1976) is a 
grammar of Quechua varieties that include Huari Quechua. More general gram- 
matical descriptions of Quechua can be found in Cerrón-Palomino (1987); Adelaar 
& Muysken (2004). These works have a strong focus on morphosyntax and segmen- 
tal (morpho-)phonology. Apart from brief impressionistic descriptions in Parker 
(1976) and comparatively short studies of the South Conchucos Quechua stress 
system (Stewart 1984; Hintz 2006), as well as our own previous work on acoustic 
correlates (Buchholz & Reich 2018), the prosody of both Huari Quechua and Huari 
Spanish is undescribed. See section 3.3.3 for a more detailed review of what is (not) 
known about the prosodic characteristics of Quechua in Conchucos. 

Regarding Spanish, to my knowledge almost no studies from any linguistic sub- 
discipline exist of a variety from a region close to Conchucos. Andrade Ciudad (2016) 
is a study of northern Peruvian Spanish varieties (not their prosody) that includes 
data from the northernmost tip of Ancash, but specifically from Pallasca province 
(Andrade Ciudad 2016: 32), where Quechua bilingualism is actually very low (0-10% 
Quechua as first language, cf. both maps in Figure 2). Andrade Ciudad (2021) lists 
only one further small work on any variety of Spanish from the entire department 
of Ancash apart from Andrade Ciudad (2016). He concludes that what has come to 
be called “Andean Spanish" in the literature is a label for a somewhat heterogeneous 
cluster of varieties and almost exclusively drawn from studies of bilingual south- 
ern Peruvian regions, especially Cuzco (Andrade Ciudad 2021: 128-133).’ One of the 


6 In terms of prosody this will be made explicit, but the examples in the work also demonstrate 
in passing, as it were, that variation exists at various levels. In Huari Spanish this concerns e.g. 
the realization of the 3'* person singular present indicative form of estar, which is está for most 
speakers, but seemingly regularly tá for some (cf. note 177). In Quechua there is e.g. some variation 
in pronunciation, as between tsuqllu [fsuxAu] and chuqllu [ffuxAu] *corncob' (cf. Figure 126 vs. 
Figure 158). 

7 He relates this to a broader academic tendency to favour the southern Andes and the Cuzco 
region as object of study in various disciplines (Andrade Ciudad 2021: 133). A similar imbalance 
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most salient phonological features described for “Andean Spanish" is not attested 
for the monolingual Northern Peruvian varieties studied by Andrade Ciudad (2016: 
224—225), the variant realization of high and middle vowels called motoseo, in which 
e.g. mesa “table? is said to be produced as a homophone of misa “church mass”, or 
poro ‘leek’ of puro ‘pure’, and vice versa. This phenomenon is usually connected to 
bilingualism with Quechua, which only has three vowels /a i u/ or /a 1 u/ instead of the 
Spanish five /a i u e o/.8 Since Huari is a fully bilingual area but the Quechua spoken 
is very different from the Quechua II varieties in regions on which the description 
of “Andean Spanish" has mostly been based (mutual intelligibility is usually said not 
to exist at least between the two main branches of Quechua, cf. Parker 1976: 24, see 
also Adelaar & Muysken 2004: 188-191 for a list of divergent features), neither what 
has been described for southern “Andean Spanish" nor for the monolingual north- 
ern varieties by Andrade Ciudad (2016) can simply be assumed to hold for Huari 
Spanish. This applies also to the prosodic descriptions of Cuzco Spanish by O'Rourke 
(2005, 2012), for which also see sections 3.4.4 and 3.5.1. 

In Huari town, our own observations as well as statements by a number of 
people interviewed suggest that the overwhelming number of inhabitants are 
bilingual and use both languages in daily interaction, yet we also met a few indi- 
viduals who were functionally monolingual in either Spanish or Quechua.? Short 
interviews were conducted with all experimental participants. Apart from eliciting 
basic sociolinguistic characteristics (see next section), participants were also asked 
which language they would normally use in different communicative situations, 
e.g. to speak with one's family, with friends, with unknown people, with author- 
ities, in the market, etc. The responses from 39 bilingual participants that were 


seems to prevail in studies on Quechua, although a number of detailed descriptions of Quechua I 
varieties exist (cf. Adelaar & Muysken 2004: 193-194). Quechua I varieties are certainly underrep- 
resented in terms of instrumental prosodic studies compared to Cuzco Quechua (see sections 3.3.3 
and 3.7.3.2). 

8 Guion (2003) investigates quality of front and back vowels in bilingual speakers of Spanish and 
Ecuadorian Quichua and monolingual Spanish speakers and paints a more nuanced picture: some 
speakers do not have separate realizations for /i/ and /e/, and /u/ and /o/, respectively, but oth- 
ers do. Some even produce three front vowels, two ([i] and [e]) for Spanish, and a distinct [1] for 
Quechua, with the most differential system most often found with the earliest bilinguals. Pérez 
Silva et al. (2008) present similar results on Cuzco Spanish and Quechua: less proficient bilinguals 
produce Spanish mid and high vowels with more overlap in F1 than Spanish monolinguals, but 
more proficient bilinguals have a similar vowel space to monolinguals. They make the important 
sociolinguistic point that the characterization that Quechua speakers “confuse” the vowels (e.g. 
produce [misa] meaning /mesa/ and vice versa) is a baseless racist prejudice against the indige- 
nous population. 

9 Data by one monolingual Quechua speaker is contained in Bendezü Araujo et al. (2019). 
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interviewed in 2015 are given in Figure 3. These are of course self-reports, so based 
on the assessments of the speakers themselves, and therefore shaped by individual 
biases and value judgements. They thus give a measure ofthe subjective perception 
of language prevalence in a given situation rather than an objective measure of it. 
Participants were all bilingual and selected via a friend-of-friend-approach, so the 
results are not representative for the population of Huari as a whole. However, 
they still suggest tendencies regarding a possible functional differentiation accord- 
ing to communicative domains between Huari Quechua and Spanish for bilingual 
Speakers. 


family business market friends 
authorities jokes unfamiliar people insults 
language 
Quechua 
Spanish 
Both 


important matters daily matters affection 


Figure 3: Responses from 39 bilingual speakers in Huari town regarding which language they 
would normally use in different communicative situations. White space indicates where not all 39 
participants responded to a question.'? 


Figure 3 suggests that the use of Spanish dominates overall, in 7 of the 11 com- 
municative situations asked about. It is especially prevalent in situations of social 
distance between speakers (“business”, “authorities”, “unfamiliar people"), com- 
merce (*business", *market"), or to discuss important issues. Quechua, in contrast, 


10 One speaker gave "English or French" as response to which languages they would use with 
unfamiliar people. 
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is dominant only in situations characterised by social proximity or expressive 
needs (“family”, “insults”), but not all of them (cf. *affection"). It is notable that in 
a number of situations of daily life (*friends", “jokes”, “daily matters", *affection"), 
a large share of the participants (a third or more) stated that they use both lan- 
guages interchangeably. In fact, both languages were given as response by at least a 
quarter of all participants for 6 of the 11 situations, suggesting a functional equiva- 
lence between the two languages in these domains for them. In general, the results 
suggest that at the moment of recording, both Quechua and Spanish are healthy 
and that speakers are indeed in general functionally bilingual across a number 
of domains. However, it also seems likely that Spanish is slowly encroaching upon 
more of the communicative domains occupied by Quechua. 


2.3 Short sociolinguistic profile of the speakers 


This section gives an overview over some basic sociolinguistic characteristics only 
of those speakers whose data is analysed in this work. Overall, data from 27 speak- 
ers (9f, 18m, median age 21 years) was used. A short interview was conducted with 
each participant to obtain the sociolinguistic data. All information is thus based on 
self-description. At the time of recording, all speakers had their place of residence 
in Huari town. All speakers are bilingual!! and fully literal in Spanish but do not 
read or write Quechua. Table 1 gives the identifier code for each speaker that is 
also used in each example in this work (column 1), together with their age at the 
time of recording (column 2), their sex (column 3), place of birth (column 4), level 
of education (column 9) and occupation at the time of recording (column 10). It 
also provides data about which of the two languages they first acquired (column 
5), age of acquisition of the second language (column 6) and the response to two 
questions aimed at approximating a notion of which language the speakers are 
more dominant in or have a preference for? They were ¿qué lengua te/le parece 
más fácil? ‘which language is easier for you?’ (column 7) and ¿qué lengua utiliza(s) 
más? “which language do you use more? (column 8). In columns 5, 7, and 8, „Q“ 
stands for Quechua, „Sp“ for Spanish, and „+“ for both. Not all speakers gave the 


11 Speaker QP44 very rarely speaks Quechua, but that seemed partially due to a social stance ori- 
ented towards a life outside the limits of Huari rather than lack of linguistic competence. 

12 This is of course a bare-bones-approximation. A thorough assessment of the degree of bilin- 
gualism of the Huari speakers both individually and as a community is not possible on the basis of 
the sociolinguistic data presented here and it is also not the goal of this study. See Schmeifser et al. 
(2015); Treffers-Daller (2019) for a discussion of the complexities involved in defining and assessing 
language dominance in bilinguals. 
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same response to both questions, suggesting that they in fact capture slightly dif- 
ferent aspects. It is notable that even though the first language for 20 of the 27 
speakers is Quechua (about 75%, just as in the overall population according to the 
2017 census), its share is far lower in response to both of these questions, with 
only 2 stating that it is the language they use more, and 6 that it is easier. In both 
cases, “both” is the most frequent response (14 both, 2 Quechua, 11 Spanish for 
„Most used“; 10 both, 6 Quechua, 10 Spanish for „easier“), again overall indicating 
a fairly solid bilingualism both in these speakers and the community they come 
from. Clearly there is also some speaker-specific variation in the responses given 
here. It will however be seen in the analytical chapters that it is not a straightfor- 
ward predictor of the prosodic variation observed there. The rightmost column in 
Table 1 gives the types of experiment corpora that were used in this study from 
each speaker. They are described in the following section. Bolder horizontal lines 
in the table above and below the data from two speakers indicate that they were 
dialogue partners in the dialogical tasks. All participants who were partners in 
tasks are either close relatives, spouses or friends, and volunteered for the exper- 
iments together, except for SG15 & QF16, who volunteered individually and were 
partnered by us. 


2.4 Description of the experimental tasks 


This section describes the communicative experimental tasks whose recordings are 
the production data forming the database for this work. For the dialogical tasks, 
Speakers participated in pairs that stayed the same throughout allthe tasks. Speaker 
pairs are marked by bolder horizontal lines in Table 1 in the previous section. Par- 
ticipants could choose which language to start in and first performed all ofthe tasks 
in onelanguage and then in the other, with breaks whenever they chose. They were 
told to treat the tasks like communicative games and the overwhelming feedback 
they gave was that they very much enjoyed themselves, many saying they would 
like to come back and *play more games". 

None of the tasks required reading, since speakers are not literate in Quechua 
(but all are in Spanish). Verbal materials used in the tasks were thoroughly 
checked and approved by native speakers, Leonel Menacho and Gabriel Barreto 
in the case of Quechua, and Raúl Bendezü Araujo in the case of Spanish. They 
appeared in the form of oral recordings in the tasks, spoken by Gabriel Barreto in 
Quechua and Raúl Bendezú Araujo and Gabriel Barreto in Spanish. The other type 
of material consisted in pictures of objects that occurred in different functions 
throughout the tasks. They were chosen so that the names of the objects depicted 
are controlled metrically, i.e. they systematically vary heavy syllables ((C)VC or 
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(C)V:) in any position in bi- and trisyllabic words in Quechua. Almost the same 
set of pictures was used for Spanish, chosen so that the intended words varied 
stress position. Words and candidate pictures were checked for their aptness to 
elicit the intended words with Gabriel Barreto before being used in the tasks. The 
different combinations of heavy and light syllables in Quechua did not yield the 
same number of easily depictable words, as (4) shows. In the tasks, speakers also 
sometimes spontaneously chose other than the intended words for the objects on 
the pictures. 


(4) Intended Quechua words depicted by images of objects in the tasks, with heavy 
(h) or light (D syllables 

h.h ach.kas" ‘lamb’; is.may H.H ‘excrement’ 

. h.h.l1 aq.tsall.ku ‘hair of the corncob’ 

h.l all.qu ‘dog’; an.ka ‘eagle’; hirka ‘mountain’; tsiq.tsi ‘bat’; tsuk.lla hut 

. h.L.h pam.pa.kuy ‘funeral’; ull.tu.kuy ‘tadpole’ 

h.1.1 pin.ku.llu ‘type of flute’; as.wa.na ‘clay vessel’ 

l.h a.fias ‘skunk’; a.tuq ‘fox’; a.rash ‘lizard’ 

l.h.h ka.puq.yuq ‘wealthy person’ 

. L.h.l cha.kallwa ‘jawbone’; na.ran.ha ‘orange’ 

1.1 qa.qa ‘rock’; ha.cha ‘tree’; ha.ka ‘guinea pig’; tsu.ku ‘hat’ 

l.l.h ha.tu.sag ‘giant, enormous’ 

. Lll i.lla.pa/ ti.lla.ku lightning’; pi.tsa.na ‘broom’ 


AS po hoe ao oD 


17 For the orthographic transcription of Quechua, I use the official alphabet for Peruvian Que- 
chua varieties as set down by the Resolución Ministerial 12-18-85-ED from November, 1985 (cf. Cer- 
rón-Palomino 2008: 70-71). The phonological values for its characters correspond largely to the 
ones in the IPA alphabet (International Phonetic Asssociation 2005), except in the cases of «ch», 
«Il», «fi», and «y», that are adopted from Spanish orthography and therefore correspond to /ff/, /4/, 
/n/, and /j/, respectively. It uses only three vowel graphs <a>, «i», «u» corresponding to the three 
vowels of Quechua. I follow this use except in the case of Spanish loanwords in which I sometimes 
use five vowels to make the word orthographically more recognisable to readers familiar with 
Spanish. The consonantal phonemes of Huari Quechua are [pbtdkgqís(ffsfhmnprlAw 
j/, with the voiced occlusives only occurring in loans from Spanish (adapting slightly from Parker 
1976: 37-44). Quechua I varieties do not have the glottalized and aspirated occlusives of Cuzco 
Quechua (cf. section 3.2.1). Regarding actual phonetic realization, which is quite variable, each 
figure with a pitch track and spectrogramme of Quechua speech in this work contains a tier with a 
syllable-separated phonetic transcription following IPA alphabet conventions. 
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(5) Intended Spanish words depicted by images of objects in the tasks 

a. Proparoxytones águila ‘eagle’; murciélago ‘bat’; relámpago ‘lightning’; man- 
dibula ‘jawbone’ 

b. Paroxytones escoba ‘broom’; gigante ‘giant’; roca ‘rock’; árbol ‘tree’; choza 
‘hut’; naranja ‘orange’; zorro ‘fox’; zorrillo ‘skunk’; lagartija ‘lizard’; olla 
‘pot’; renacuajo ‘tadpole’; perro ‘dog’; montaña ‘mountain’; choza ‘hut’; 
cordero ‘lamb’ 

c. Oxytones mani ‘peanut’; funeral ‘funeral’ 


Beyond those imposed by the instructions at the beginning of each task and the 
materials, there were no restrictions by the experimenters on the communicative 
behaviour of the participants during the tasks. Besides a role as game master in 
some of the tasks, the experimenters in general only interfered when participants 
had clearly misunderstood some of the instructions, or asked for help. In particu- 
lay, no interference was made when speakers used other than the intended words 
for the game objects, or when they switched between languages. None of the tasks 
involved scripted speech, and all of the speech data used in this work can there- 
fore be classified as at least semi-spontaneous. Raul Bendezü Araujo and the author 
were present as experimenters in all of the tasks whose data is used in this work; 
with Raül Bendezu speaking Quechua sufficiently to act as game master in some 
of them. The following describes the individual tasks. All the data from dialogical 
tasks (Conc, Maptask, Cuento, Quien) were recorded in 2015. Data from the pseu- 
do-dialogical task Elqud was recorded in 2017. 

Conc. Short for concurso “guessing game”, and similar to the children’s game 
Memory. Twelve picture cards chosen according to the metrical control were laid 
out in a grid on the table between the participants, face up. The participants had 
half a minute to memorize where each picture was, then the cards were turned face 
down. The speakers then took turns saying where which picture was, one at a time. 
They were not allowed to touch the setup or to use gestures, so that they had to use 
spoken language. After a guess, the game master confirmed that he had understood 
correctly and then turned over the card the speaker had indicated. If they had been 
right, they scored a point, the card was put aside, and the speaker was allowed to 
continue. If not, the card was turned face down again and it was the other speaker’s 
turn. The game was over once all cards had been successfully identified. 

Maptask. Map tasks are an established type of elicitation game (cf. e.g. Ander- 
son et al. 1991; Grice & Savino 1997, 2003). In our version, the two participants were 
each given a map with pictures of objects whose names were controlled metrically. 
Only one participant’s map had a path drawn between the objects. The task con- 
sisted in communicating so that the participant without the path was able to draw 
the correct path also on their map. Participants were not allowed to look at each 
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other's maps or to use gestures. The two maps were also different in that some of 
the objects were at different locations in them. The speakers were not told about 
this initially, so that solving the task necessarily included resolving the communica- 
tive conflict that would arise from the mixed-up locations. The game was over once 
the speakers had communicated the correct path successfully to their satisfaction. 

Cuento. Spanish for *story", this task consisted of a version of the game chinese 
whispers. One of the participants left the room, and the other was played a record- 
ing of a short story containing many of the words chosen according to the metri- 
cal control. The stories were invented by the three experimenters and revised and 
read aloud by Gabriel Barreto for the Quechua recordings, and by Raúl Bendezü 
Araujo for the Spanish recordings. The participant could ask to have the recording 
replayed until they were confident that they remembered it. Then the other partic- 
ipant reentered the recording room, and the first participant told the story to them. 
After hearing the story and being free to ask anything about it, the second partici- 
pant then told the story to the experimenters, while the first participant could help 
or correct them. 

Quién. Spanish for *who", this is a task based on the game who am I?. The 
experimenters prepared four cards with names of people that they knew both par- 
ticipants to know (very famous people or people known personally to all involved), 
and one of the participants drew one of the cards in secret. The second participant 
then had to find out about the identity of the person on the card using any type 
of question, except for directly asking who it is via a wh-question, but with polar 
guessing questions like “Is it person x?" allowed. The first participant had to answer 
truthfully and to the best of their knowledge, without giving too much information 
away that wasn't directly asked for. The game was over once the person on the card 
had been correctly guessed or the second participant gave up. 

Elqud. This is the only task whose data is used here that did not involve two par- 
ticipants, but only simulates utterances in a dialogical context. It uses a combination 
of visual and audio stimuli. Images or short animations were prepared that showed 
short situations or interactions between entities and objects. For each such situation, 
a corresponding audio was recorded (checked for acceptability and appropriateness 
and spoken for the recordings in both languages by Gabriel Barreto) that described 
the situation. In each of the experimental items, as opposed to the fillers, there was 
an intended discrepancy between audio and visual stimulus. The discrepancy was 
intended to correspond to different elements in the experimental items, e.g. a noun 
phrase corresponding to one of the figures in the visual stimulus (“the black dog”), 
a noun or adjective corresponding to an attribute of one of the figures (*the white 
dog"), a verb corresponding to an action or relation between the figures (*the dog 
ate the bone"), or also more complex parts of the audio stimulus corresponding to 
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similarly complex relations in the visual stimulus, such as double-topic constructions 
(“the white dog plays with the blue ball, the black dog plays with the red ball"). Par- 
ticipants were shown a visual stimulus and heard the corresponding audio stimulus. 
They were asked to treat the audio stimulus as a contribution by a dialogue partner 
about the situation shown by the visual stimulus, and to respond to this contribu- 
tion, either by correcting it (in the experimental items), or by agreeing to it (in the 
fillers). They were asked to respond as expansively as possible in general (verbal- 
izing as many of the elements shown), but otherwise not constrained in the form of 
their response, and no intervention was made when they responded elliptically or 
contrafactually. In total, 69 stimulus pairs (including 17 fillers) each were created for 
Quechua and Spanish, and speakers were shown them in two pseudo-randomized 
sequences, first in one language, then the other. 

The data from the three dialogical tasks Conc, Maptask, and Cuento forms the 
empirical backbone of the analyses throughout this study. Data from Quien (in both 
languages) and Elqud (only Spanish) is used where indicated, in particular in sec- 
tions 6.1.6. and 5.2., respectively. 


2.5 Partial availability of the speech data used in the analysis 


The Conc, Maptask, Cuento, and Quien speech data from speaker pairs TP03 & 
KP04, QZ13 & OZ14, AZ23 & ZZ24, ZR29 & HA30, XU31 & OA32, and XQ33 & LC34 
in Quechua and Spanish has been published with annotation files and metadata, 
as Bendezú Araujo et al. (2019) and Bendezú Araujo et al. (2021), respectively. It is 
freely available online!? for any interested party to be used non-commercially with 
attribution (creative commons licence CC BY-NC-SA 4.0). 


2.6 General method 


Overall, this study uses a mixture of qualitative and quantitative analysis. In all 
sections, the analytical argument is based on an in-depth knowledge of the data 
acquired through several years of study and by the author's having been present 
during all of the recordings. In some sections, the results are only exemplified using 
individual examples. Other sections also employ quantification where possible, 
and occasionally also inferential statistical analysis. 


18 At https://refubium.fu-berlin.de/handle/fub188/25747 and https://refubium.fu-berlin.de/handle/ 
fub188/29497, respectively. Note that Conc is called Memoria in the published data. 
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2.7 Acoustical analysis, visualization and statistics 


Acoustical analyses of the data were conducted in praat (Boersma & Weenink 2020). 
Scripts for the automatic extraction of measurements were written by the author, 
partially based on routines taught by Raféu Sichel-Bazin. For the generation of pitch 
track, oscillogramme, and spectrogramme images used for the examples, a slightly 
customized version of the script Create Pictures (Elvira-García 2017) was used with 
a voicing threshold of 0.6 if not otherwise noted. Occasionally, pitch movement due 
to consonantal microprosody or background noise was manually removed. Sta- 
tistical analyses were conducted in R (R Core Team 2020) using Rstudio (RStudio 
2009—2020). Plots from R were produced using ggplot2 (Wickham 2016). Further R 
packages are cited in those sections for whose analysis they were used. Individual 
sections also contain further specific methodological descriptions. 


2.8 Ashort note on examples 


Each Huari Spanish and Quechua example with a pitch track as well as each longer 
context example appearing in this study has a corresponding audio file hosted online 
via the Open Science Framework (Foster & Deardorff 2017) and published open 
access under creative commons license CC BY-NC-SA 4.0, at https://doi.org/10.17605/ 
OSEIO/MAH8Z (Buchholz 20212). In electronic versions of this document that are 
hyperlink-compatible, the audio file is accessible via the weblink in a separate foot- 
note for each pitch track or example. 

Appendix A also lists the stable URLs for all audio files so that they can be ac- 
cessed in document formats without hyperlink compatibility. In the text, examples 
are given a code, which is to be read as follows: speaker task language start time, e.g. 
KP04 Conc Q 1653 meaning that the example is by speaker KP04, from the Conc task 
in Quechua, starting at 165.3 seconds from the beginning of the recording of the task. 


3 Theoretical background and literature review 


In this chapter I lay out the theoretical groundwork necessary for the analyt- 
ical chapters. An empirical investigation into the prosody of Huari Spanish and 
Quechua necessitates not only the establishment of theoretical and analytical ter- 
minology, but also an understanding of the range and limits of prosodic variability. 


3.1 Introduction 


In the following I will introduce and develop fundamental concepts (sections 3.2 
and 3.3), putting particular emphasis on the range of typological variation observed 
in the areas of stress, accent, and prosodic phrasing (section 3.4), and the relation 
between tones, prosodic structure, and segmental material (section 3.5). We will see 
that this variation goes beyond the descriptive coverage of labels such as *culmi- 
native stress", *delimitative phrasing", or *segmental anchoring". Such a detailed 
exploration of the ranges of prosodic variability is relevant because only from a 
comprehensive viewpoint is it possible to characterize the prosodic phenomena we 
will observe in both Spanish and Quechua, and to bring them into an appropriate 
perspective towards each other. Not doing so would run the risk of studying Con- 
chucos Quechua, about whose prosody and intonation very little is known, with 
Spanish as an analytical foil, simply because so much more is already known about 
the prosody of Spanish varieties. With such a widened typological perspective, on 
the other hand, it will hopefully be better possible to analyze both local languages, 
Quechua and Spanish, on their own terms and to locate them, with their individual 
variability and areas of overlap and divergence, on the variation space mapped 
out by what is known about prosodic typology. Over the course of this chapter, the 
discussion of the relevant prosodic concepts therefore will often be based not only 
on what is already known about Spanish, but also about typologically more distant 
languages as comparison. Quechua will also feature wherever possible, but unfor- 
tunately, for the most part to mark gaps in our knowledge. 

Finally, the phenomena observed in the areas of pitch peak and accent distri- 
bution, as well as of (scaling-based) cues for prosodic phrasing in the analyses of 
Huari Quechua and Spanish (in particular sections 5.1.2, 5.1.3, 5.2, 6.2.3.3, and 6.4), 
make it necessary to include the issues of possible recursiveness in prosody and the 
prosodic cueing of information structure into the analysis. The theoretical ground- 
work for these later analytical decisions is also laid in this chapter, in sections 3.6 
and 3.7, respectively. Their analysis is intended to contribute to suggestions about 
how the prosodic variation space can be conceptualized cross-linguistically. 


[o] Open Access. © 2024 the author(s), published by De Gruyter. | C9 EXE This work is licensed under the Creative 
Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783111304595-003 
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From the second half of the 20" century onwards, it has been increasingly 
recognized that speech sounds exhibit systematic behaviour that can only be 
described by going beyond individual segments. The study of these aspects of 
speech is therefore sometimes called suprasegmental phonetics and phonology. 
Linguistic sound systems organize individual speech sounds into larger units via 
the concerted manipulation of the parameters’? pitch, loudness, spectral quality 
of vowels and consonants, and duration, and they allow for the signaling of mean- 
ings independent of those conveyed by the segmental string in particular by the 
manipulation of pitch. All aspects of this system concerned with the grouping and 
composition of units are together called prosody. The systematic manipulation of 
pitch for postlexical linguistic functions” in particular is called intonation (cf. Ladd 
2008: 5-7). Prosody and intonation are intimately connected, since prosody defines 
the domains on which an intonational event must occur in order to fulfill a specific 
function, and the two terms are often used nearly interchangeably. The restriction 
to postlexical functions distinguishes intonation from the use of pitch to distinguish 
lexical contrasts in tonal languages. 


19 The names given here are strictly speaking those of the perceptual/psychoacoustic correlates of 
physical properties ofthe acoustic signal. Pitch is the perceptual correlate of fundamental frequen- 
cy, or FO, loudness that of intensity (energy amplitude), vowel quality that of formant frequencies 
above (overtones of) F0, consonant quality a mixture of that, aperiodic signals in the spectrum and 
timing measures such as voice onset time (VOT), and duration that of quantity (Ladefoged 2005; 
Fastl & Zwicker 2007; Ladefoged & Johnson 2011). The relationship between acoustic and psycho- 
acoustic correlates is not straightforward in human perception, see e.g. Traunmüller 1990, 2017; 
Dawson et al. 2017; Fahey & Diehl 1996; Kuang & Liberman 2016; Ladd et al. 2013. For practical 
purposes in intonation research, the labels are often used interchangeably (Ladd 2008: 5). 

20 Ladd (2008) stresses the distinction between linguistic and paralinguistic uses of pitch and other 
prosodic parameters. Paralinguistic uses signal emotional and attitudinal states of the speaker, and 
are characterized by a gradual and iconic relationship between what is signaled and the signal 
itself: paralinguistically, loudness signals arousal or emotional involvement; the louder, the more 
emotionally involved. Linguistic uses, on the other hand, are supposed to convey aspects of mean- 
ing that are discrete and categorical and be expressed by formal means that are also essentially dis- 
crete. It has long been recognized that intonation straddles the boundary between these domains 
(Bolinger 1978: 475 famously calling it a *half-tamed savage"), in particular because pitch variation 
is by nature gradual and continuous, instead of discrete and categorical. The idea of paralinguistics 
is furthermore problematic because many meanings thought to belong to it, such as propositional 
speaker attitudes, are not only expressed by quite discrete morphosyntactic means in several lan- 
guages (e.g. German discourse particles, morphology for expressing mirativity, etc.), but can also 
be shown to be semantically well-defined, and expressable by equally well-defined intonational 
cues (Fliessbach 2023). Ladd (2008: 34-42) maintains the distinction conceptually, but allows for 
the integration of gradual form into phonology. 
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3.2 Prosodic units and their hierarchy 


Prosody research has proposed the following hierarchy of units apart from seg- 
ments: segments group together to form onsets, nuclei and codas of syllables (o). 
Nucleus and coda together are called the rhyme. A rhythmic unit, the mora (u), is 
often positioned below the syllable, but is better seen as orthogonal to it. One or 
several syllables together form feet (F or X), which in turn form prosodic or phono- 
logical words (w). Prosodic words are not isomorphic with lexical or morphological 
words, but it is a reasonable heuristic to assume that they broadly map onto each 
other. In Spanish, some morphological words such as adverbs formed by attach- 
ing —mente to an adjectival root (e.g rápidamente *quickly", nuevamente *newly, 
again") are often produced as two prosodic words, as can be seen from them real- 
izing two stresses, and even pitch accents, one on the stressed syllable of the root, 
and another on the penult of —mente. On the other hand, clitics do not affect stress 
and pitch accent placement on the words they attach to (e.g. te lo traduzco “I trans- 
late it for you", cocinárselo “to cook it for oneself"). Thus on the criterion of being 
the domain of stress assignment, they form a single prosodic word on more than 
a single morphological one. For Quechua, this is largely unexplored. Given that 
Quechua is agglutinating and allowing for quite long polysyllabic words consisting 
of a root plus a number of suffixes, observations about multiple stresses or accents 
on a single word in several varieties (e.g. in Parker 1976; Adelaar 1977; Hintz 2006) 
indicate that some morphological words are produced as more than one prosodic 
word. Non-isomorphy between the morphological word and the prosodic word as 
domain of stress assignment is assumed by Stewart (1984: 207—209) for Conchucos 
Quechua,” and by Levinsohn (1976: 20) for Inga Quechua (Quechua IIB, Southern 
Colombia), with the domain of stress assignment sometimes larger sometimes 
smaller, than the morphological word. 

Above the prosodic word, a number of units can be roughly classed into two 
phrasal groups: a smaller phrasal unit has been identified under the name of 
Accentual Phrase (AP), Minor Phrase, phonological phrase (PhP or Ø), or interme- 
diate phrase (iP). A larger unit has been called Major Phrase or Intonational Phrase 
(IP or U, sometimes said to be equal to or yet below, the utterance (U or v). Their 
attestation differs substantially between languages as well as between descriptive 
and theoretical approaches. Table 2, taken from Frota (2012: 257), gathers proposed 
hierarchies of units into three groups based on different approaches in the litera- 


21 Conchucos Quechua includes Huari Quechua. When giving information about a different Que- 
chua variety, I indicate the classificational group the variety belongs to (cf. section 2.2) as well as 
its geographic region. 
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Table 2: Different prosodic hierarchies proposed in the literature, from Frota (2012: 257). 


a. Rule-based b. Intonation-based c. Prominence-based 
Intonational Phrase (IP) IP Nuclear Accent 
Phonological / Major Phrase Intermediate Phrase 

Clitic Phrase / Minor Phrase / Accentual Phrase Accent 

Prosodic Word Group 

Prosodic Word (PW) (PW) Stress 

Foot Foot Full Vowel 

Syllable Syllable Syllable 

Mora Mora 


ture. Early proposals in the rule-based generative tradition (Selkirk 1984; Nespor 
& Vogel 2007) assume that the prosodic hierarchy is universal across all languages. 
Later approaches (Selkirk 1996, 2011; Ito & Mester 2007, 2012; Féry 2017) maintain 
this claim but reduce the inventory by making all domains recursive in a departure 
from earlier views. 


3.2.1 Different kinds of evidence for the existence of prosodic units 
in languages 


Empirically speaking, a prosodic unit should only be said to exist in a language if 
there is tangible evidence for it, i.e. if phonological or phonetic processes can be 
shown to make reference to it, or if it is perceptually or articulatorily robust. In 
many languages, prosodic domains constrain how segments are distributed. This is 
most frequently seen with syllables, which impose restrictions on segment distribu- 
tion in nearly all languages, based on preferences that enhance contrast and rhyth- 
micity (Vennemann 1988). Thus, e.g. the sequence /sp/ is not ilicit per se but must 
always straddle a boundary between two syllables in Spanish, because codas may 
not contain more than one consonant and in onsets the two segments do not create 
a sufficiently steep sonority gradient. But such restrictions can also act on larger 
domains: either an aspirated or a glottalized consonant, never both, may optionally 
occur only once per word (on the initial stop in the root and never in the suffixes) 
in Cuzco Quechua (Quechua IIC, southern Peru; Quechua I varieties do not have 
these consonants), unlike their simple counterparts, which are not constrained in 
this way (cf. (6); Parker 1997: 2; Cusihuamán 2001: 34). This restriction of once-per- 
domain is an example of a culminative property by which the relevance of a word- 
level unit can be argued, here for Cuzco Quechua. 
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(6) Distribution of glottalized and aspirated consonants in Cuzco Quechua: cul- 
minative but optional 
a. tanta “bread” 
b. thanta “old, used up” 
c. tayta “father, old man" 
d. *tant’a 
e. *tantha 


Hyman (2006: 229) lists other properties besides culminativity that can be used to 
define prosodic word domains in a language. Phonological processes specific to a 
domain can e.g. affect the segmental makeup by allowing assimilation processes 
only within a domain, but blocking it at its edges. For s-aspiration in Spanish, the 
domain of application differs between varieties: 


(7) ` /s/-aspiration in Spanish varieties (adapted from Strycharczuk & Kohlberger 


2016) 
Andalusian Rio Negro Chinato Buenos Aires 
/ Honduras Argentinian Spanish Spanish 
Spanish (Hualde Spanish (Hualde 1991) (Kaisse 1999) 
1991; Kaisse 1999) (Kaisse 1999) 
dieces “tens” [die.seh] [die.seh] [die. deh] [die.ses] 
desigual [de.hi.gual] [de.sigual]  [de.di.gual] [de.si.gual] 
“unequal” 
dos palas [doh.pa.lah] [doh.palah]  [doh.palah]  [doh.pa.las] 
"two 
shovels" 
dos alas [do.ha.lah] [do.ha.lah] [do.óa.lah] [do.sa.las] 
"two wings" 


Data such as in (7) is often interpreted to show that in all varieties, the minimal con- 
dition for aspiration of /s/ is for /s/ to be in coda position. However, what counts as 
a coda position is influenced by another process, usually called *resyllabification", 
which causes postvocalic consonants to be produced as an onset when followed by 
a vowel, even if a word boundary intervenes. The syllable boundaries, symbolized 
by the dot (), that are given in (7), are intended to be those after this *resyllabifi- 
cation" has occurred. To explain (7), Hualde (1991) and Kaisse (1999) propose that 
aspiration and resyllabification occur in different orders between the varieties (see 
there for details). In Buenos Aires Spanish, but not in the other varieties, aspiration 
does not occur before a pause (Kaisse 1999: 206-207), i.e. before a phrase bound- 
ary (Strycharczuk & Kohlberger (2016: 2)). Thus, /s/-aspiration interacts with the 
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boundaries of up to three different domains: syllables, prosodic words and (phono- 
logical) phrases. Resyllabification also does not occur across pauses, but its complex 
workings lead Cardinaletti & Repetti (2009) to propose a new prosodic constituent, 
the *phrasal syllable level", as its domain. In the next section we will see what the 
pitfalls of proposing prosodic constituents based on individually observed phenom- 
ena can be. 


3.2.2 Universal aspects of the prosodic hierarchy 


Assuming universality of the units of the prosodic hierarchy would at this point 
mean postulating the phrasal syllable level also for all other languages, even if 
no processes ever take it as their domain apart from Spanish resyllabification. 
However, on the criterion of demonstrably being the domain of at least one pho- 
nological process in all languages, it turns out that few if any of the proposed units 
are really universal (cf. also Grijzenhout & Kabak 2009). Even the syllable, maybe 
the most universally accepted of these units, is perhaps not the domain of any 
phonological processes in at least one language, Gokana (Niger-Congo, cf. Hyman 
2011, 2015 for a discussion). The prosodic word has been argued not to be uni- 
versal based on several languages, including Vietnamese and Limbu (Sino-Tibetan, 
cf. Bickel et al. 2009; Schiering et al. 2010). In contrast, Himmelmann et al. (2018) 
present quite robust evidence that a unit corresponding to the intonational phrase 
can be identified in perception consistently even in languages the listeners are 
unfamiliar with. Its length also seems to average at 1.5 seconds in some data on 
English, French, and German reported on by Jun (2005d: 443), suggesting that there 
is at least some amount of overlap among observing linguists regarding what con- 
stitutes an IP crosslinguistically. 

Arguments made about segmental prosodic processes are often made based 
on symbolic, categorical data such as given in (7), but increasingly, instrumental 
(acoustic and articulatory) findings are also brought to bear on them. They have 
resulted in similar observations of gradual variability in segmental realizations 
depending on prosodic domains across a variety of languages and are in general 
known as the prosodic strengthening of domain edges. The two edges of domains 
do not show the same effects: very broadly speaking, domain-initial strengthening 
often seems to make consonants at the beginning of larger prosodic domains such 
as utterances or IPs more consonant-like, judging from articulatory measurements 
such as increased linguopalatal contact and acoustic measures such as increased 
voice onset time (Fougeron & Keating 1997; Onaka 2003; Keating et al. 2004; Keating 
2006; Cho & Keating 2009), while vowels enhance those features that make them 
more contrastive against other vowels (Georgeton & Fougeron 2014). On the other 
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hand, final lengthening, as the name suggests, is an effect of increased duration at 
the end of phrasal domains that can affect both individual segments as well as sylla- 
bles (Beckman & Edwards 1990; Rao 2007, 2010; Fletcher 2010; Petrone et al. 2017). 
The durational measurements are sometimes able to distinguish between posi- 
tions defined with respect not just to a single, but several, prosodic domains (e.g. 
Strycharczuk & Kohlberger (2016: 7-9) on /s/-realization in Peninsular Spanish). 
Although both of these phenomena have been observed across a variety of lan- 
guages and are thought to be “phonetic” markers of prosodic structure rather than 
(language-specifically) “phonological” ones (cf. Vaissiére 1983; Keating 2006; Cho 
& Keating 2009: 466), individual studies also show differences both in terms of the 
precise nature and strength of the effects observed as well as the domain at which 
they occur between languages, and also between different information structural 
conditions (cf. Cruttenden 1997: 33; Fletcher 2010: 529—532). Differences between 
individual speakers should also not be discounted. For example, Strycharczuk & 
Kohlberger (2016: 9) note several degrees of durational domain-sensitivity in /s/-re- 
alization among their speakers, ranging from differential realization for each of the 
categories to total insensitivity across them. 


3.3 Intonation in the autosegmental-metrical model 


For the purposes of this work the most relevant phenomena identifying domains 
of the prosodic hierarchy are tonal ones. In the model of intonational phonology 
adopted here, the autosegmental-metrical (AM) model of intonation (Pierrehum- 
bert 1980; Ladd 2008), tones are represented on a tonal tier that is autonomous 
from the segmental tier?? and both are independently assigned to a specific level 
and position in the prosodic structure. This autonomy of tones and segments and 
their mediation via the prosodic structure (called *tune-text-association") is vital 
for making sense of how tonal contours relate to segmental strings of different 
length and morphosyntactic complexity: 


(8) The “incredulity” contour on utterances of different length and complexity 
a. 


Bob?! 


22 ,Autosegmental* does not only mean that tones and segments are on separate tiers, although 
this was one of the original applications of autosegmental phonology (Goldsmith 1976). Different 
classes of segmental features are also taken to be located on separate tiers (e.g. McCarthy 1986; 
Hellmuth 2013; Venditti et al. 2008: 458—459). 
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h. — ú ou. 


The long-lost heir to the throne ?! 


p 


He's golfing tomorrow ?! 


In (8), the “incredulity contour??? (Ward & Hirschberg 1985; Hirschberg & Ward 
1992), represented by the schematic rise-fall-rise above the text, is produced on 
three different utterances. In (8)a, it is realized on the monosyllabic name Bob, 
appropriate to a context where Bob has just been suggested as the answer to a 
pending question by someone else but prior to that was deemed by the speaker not 
to be a likely candidate for the question (e.g. who might cook a complex meal for 
six in the evening if the only culinary action the speaker has ever witnessed Bob 
performing was to bake a frozen pizza, and to burn it). In (8)b, it is produced on 
the complex noun phrase the long-lost heir to the throne, felicitous in a context e.g. 
where the speaker has just been told that Princess Peach, the royal scion, has made 
a public appearance and the speaker up to that point had believed the princess's 
whereabouts to be unknown. In (8)c, the same contour is found on the intransi- 
tive sentence he's golfing tomorrow, e.g. in a context where the speaker knows that 
an important parliamentary debate is taking place the next day and has just been 
told that the president will be golfing at the time. The domain for such a contour — 
not just this one, but all comparable ones - is taken to be the Intonational Phrase 
(Hirschberg & Ward 1992: 242). This is one aspect why AM assumes tune-text-asso- 
ciation to happen via the prosodic structure: the important point is that whether 
the contour is felicitous depends not on the morphosyntactic makeup of the utter- 
ance, nor its length (although it must certainly consist of at least one sufficiently 
sonorous sound), but only upon the appropriate context conditions and that it be 
realized on one intonational phrase, in the correct form. 

The correct form, however, depends on more than just proportionally adapting 
the tonal movement to the length of the text, spreading the rises and falls evenly 
across it: in both (8)b and (8)c, the high and low points in the contour are again 


23 Also discussed under various labels in Liberman & Sag (1974); Liberman (1975); Marneffe & 
Tonhauser (2019) and elsewhere. Descriptions of its meaning are multifaceted. Hirschberg & Ward 
(1992) find evidence that increased pitch excursion shifts listeners’ perception towards an “in- 
credulity" reading of ambiguous sentences, while perception is shifted towards an *uncertainty" 
reading when pitch excursion is comparatively decreased. The same contour has also been found 
to evoke more negative scalar implicatures in listeners when used as an indirect answer to a polar 
question than a neutral declarative contour (Marneffe & Tonhauser 2019). I will use *incredulity" 
as a shorthand for the meanings associated with it. 
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located relative to specific positions in the prosodic structure. AM assumes just two 
tonal primitives, a high (H) and a low (L) tone. The rise-fall-rise contour under dis- 
cussion here is usually taken to be made up of a sequence of four tones, LHLH. These 
are responsible for the pitch movement in the tune: only where a tone is specified 
is pitch actively manipulated; between tonally specified locations, any tonal move- 
ment is due to interpolation (Pierrehumbert 1980: 52). The tones in the phonologi- 
cal representation are not quite directly reflected in the actual pitch contour: they 
cause tonal targets to exist in the phonetic implementation, which is responsible 
for realizing the tonal contour together with the segmental string in speech pro- 
duction by, amongst other things, assigning pitch values relative to the overall pitch 
range ofthe utterance: high tonal targets, due to high tones, get assigned relatively 
high pitch values, low tonal targets, due to low tones, get assigned relatively low 
pitch values. In the LHLH contour, the position of the final H tone is determined 
simply by the right edge of the intonation phrase: it is as rightmost in the phrase as 
it can be. Such tones are called boundary tones. For the adequate specification of 
the other tones' position in the contour, the edges of prosodic units are not enough. 
That specification must make reference to the metrical structure of the utterance. 


3.3.1 Stress and metrical structure 


Metrical structure (Liberman & Prince 1977; Hayes 1995; Ladd 2008; Calhoun 
2010b) assigns strength relations (prominence) throughout the prosodic structure. 
At each level of prosodic structure, all the constituents of the level below that are 
dominated by a constituent at that level, are assigned a strength relation such that 
only one of them is strong (s), and the others are weak (w). 


(9)  Metrical structure for the phrase siete armadillos *seven armadillos" 


ð: E PhP /ip/ d b. x PhP /ip/ d 
te ~~ 

w S prosodic word (w) x x prosodic word (w) 
| dur SS 
S W s foot (F) X X X foot (F) 

ZN Vá IN 

S W SW S W syllable (ø) X X X X X X syllable (a) 

sje te ar ma di Aos sje te ar ma di Aos 


In (9), the metrical structure for the phrase siete armadillos *seven armadillos" is 
given in branching tree notation (a) and grid notation (b). The two notations are 
effectively equivalent for our purposes. In both, prominence is built from the 
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ground up. Crucially, it is relative: at the level of the foot, the two syllables [ar] 
and [di] in armadillos are still equal in strength, but at the level of the prosodic 
word, the foot which contains [ar] is weak relative to the strong one containing [di], 
which is how the fact that [di] is the stressed syllable in armadillos is represented. 
Strong nodes at one level must always be founded on a strong node at the level 
below, so that strength propagates upwards. In this way, structures such as (9) can 
represent three facts at one. Firstly, that [sje] and [di] are the stressed syllables in 
their respective words. Secondly, that [di] is the strongest syllable in the phrase. 
And finally, that armadillos is stronger than siete in it. Even though it might seem 
so from this example, metrical structures are not maximally binary branching, but 
n-ary branching in principle, in order to allow the assignment of exactly one strong 
position for level x-1 at each constituent of level x dominating it (Nespor & Vogel 
2007: 7). The metrical structure also assigns prominence at levels higher than the 
iP/PhP: in the utterance Juan encontró siete armadillos “Juan met seven armadil- 
los", it would also put siete armadillos in a strength relation with the rest of the 
utterance. How it would do that however would depend on how the utterance is 
phrased and on its information structure (cf. section 3.7). 


3.3.2 Word stress crosslinguistically 


The strongest position at the word level is particularly important. It is usually 
called “word stress", “lexical stress" or simply “stress”, although this is a somewhat 
confusing terminology, since “stress” also signifies the particular way in which 
many (European) languages configure their prosody in order to mark this position 
both phonologically and phonetically. This typically includes increased duration 
and intensity on stressed syllables in comparison to unstressed ones (Fry 1955 for 
English, Ortega-Llebaria & Prieto 2007, 2011 for Spanish, Gordon & Roettger 2017 for 
a broad crosslinguistic overview), but the evidence for such claims has to overcome 
a considerable number of methodological pitfalls (cf. Roettger & Gordon 2017). The 
most relevant of those is to separate stress from pitch accent. For Spanish in par- 
ticular, Ortega-Llebaria & Prieto (2007, 2011) find that stressed syllables are longer 
and louder than unstressed ones even when not pitch accented. Acoustic cues to 
prominence are statistically robust in many languages, but in individual instances, 
they can often be missing or misleading. Yet experimental subjects securely iden- 
tify prominent positions, both at the word level and above also under less than 
ideal cues and even against cues, up to a point (Terken & Hermes 2000; Bishop 2012; 
Cole et al. 2019, cf. also the works cited in Calhoun 2010b: 4—5), and the expectation 
of a prominent position to occur increases attention to detail even when cues are 
absent (Zheng & Pierrehumbert 2010). All of this indicates that metrical structure 
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should primarily be thought of in relation to this generation of expectations about 
prominence (Reich & Rohrmeier 2014), which can then be exploited for interpreta- 
tive information structural effects (Ladd 2008; Calhoun 2010b; Bishop 2012), rather 
than with regards to its acoustic correlates. 

Language-specific stress-related phonological phenomena include e.g. the 
historical alternation in Spanish in the morphological paradigms of many words 
whereby /o/ and /e/ occur when a syllable is unstressed, but the diphthongized or 
pre-glided counterparts /we/ and /je/, respectively, occur when it is stressed, e.g. 
apostár — apuésto “to bet — I bet”, tenér — tiéne “to have — s/he has” (accent indicates 
stressed syllable). Historically, there is also a tendency to gradually reduce material 
following the stressed syllable (syncope), resulting in increasingly few word forms 
where the stressed position is followed by more than one further syllable. This is 
a particular instance of the much more general observation that stressed syllables 
in many languages are most resistant to reduction processes, both historically and 
in production, relatable also to the fact that more peripheral and sonorous vowels 
tend to occur in stressed position, while unstressed positions often have a more 
restricted set of less sonorous vowels in many languages (Crosswhite 2004). For an 
overview of stress-related phonological processes in various Romance languages, 
see Meinschaefer (in press). In contrast, there are languages that do not share in 
any or most of the properties associated with “stress” but still have a comparable 
unique syllabic position at the word level. The stereotypical example is Japanese, 
where about 45% of the words in the lexicon (Kubozono 2008: 167) have such a 
unique lexically specified position which can be anywhere in the word, but it is 
not marked by increased duration or intensity, or any other phonetic or phonolog- 
ical process except a characteristic pitch movement (Beckman 1986; Venditti et al. 
2008). Japanese is usually said to have a lexical (pitch) accent, which is somewhat 
confusing terminologically, because pitch accent is what the postlexically assigned 
tones linked to prominent syllables are called in AM. Leaving aside the issue of 
the phonetic and phonological correlate for the moment, word stress has two 
defining properties that directly derive from the nature of the metrical structure, 
culminativity and obligatoriness (cf. Liberman & Prince 1977: 263; Hayes 1995: 29; 
Hyman 2014: 60). Culminativity is the property that only exactly one syllable can be 
stressed (= assigned highest prominence) in a word; as we have already seen, this is 
a property of metrical structure that holds at each level. Obligatoriness means that 
every lexical content word in a language must be stressable. This is a corollary of 
every utterance having a metrical structure that assigns strength relations at each 
level. Having more than half of the lexicon consist of words that are not accented is 
clearly a different matter than having a closed and relatively small set of function 
words that cannot be stressed: these can often taken to be clitics and therefore never 
form prosodic words on their own (see Hualde 2007, 2009 for stress on function 
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words in Spanish). Japanese then does not have word stress under this definition 
quite independently of its acoustic correlates, and in the analysis of our own data 
we will see that Huari Quechua shares some, but not all of the properties that lead 
to this conclusion. That does not mean that Japanese does not have a metrical struc- 
ture, it just means that the position of its lexical pitch accents does not necessarily 
have something to do with it. This is borne out by the fact that accented words are 
not in any sense more prominent in Japanese than unaccented ones. High vowels 
bearing the H tone ofthe lexical pitch accent are often reduced (Venditti et al. 2008: 
480—481), which would be unexpected in a stress system where the stressed syllable 
is the most prominent at the word level. The Japanese lexical pitch accent is simply 
a lexical specification that has the additional property of being culminative at the 
word level, similar to the distribution of aspirated and glottalized plosives in Cuzco 
Quechua (cf. (6), see also Hyman 2006: 238—239). Yet Japanese, and other languages 
that do not have word stress, clearly do have prominence above the prosodic word, 
so that one word is more prominent than the others in a phrase (cf. Venditti et al. 
2008 on Japanese, Roettger 2017: 135 on Tashlhiyt Berber). One theoretical solu- 
tion to account for this would be to assume that the relevant prosodic domain, the 
prosodic word, is not headed (Gussenhoven 2018: 398 on Ambonese Malay), which 
effectively means that metrical structure is not assigned at its level. Another option 
could be to assume that it is assigned a metrical structure, but this finds no, or only 
very little, expression in the language under discussion. The last option might be 
less abstract than it sounds: Hyman (2014: 57—58, 61) develops properties and func- 
tions that stress systems may have and shows that typologically, languages do not 
just cluster around having either all or none of these, but seem to occupy many of 
the available positions in between. 


(10) Properties of a highly stress-oriented phonological system (Hyman 2014: 
57-58) 
a. Stress location is not reducible to simple first or last syllable (which could 
simply represent a boundary phenomenon). 
b. Stressed syllables show positional prominence effects: 
i  Consonant, vowel, and tone contrasts are greater on stressed sylla- 
bles. 
ii. Segments are strengthened in stressed syllables (e.g. Cs become aspi- 
rated or geminated, Vs become lengthened, diphthongized) 
c. Unstressed syllables show positional non-prominence effects: 
i  Consonant, vowel, and tone contrasts are fewer on unstressed sylla- 
bles. 
ii. Segments are weakened in unstressed syllables (e.g. Cs become lenited, 
Vs become reduced). 
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d. Stress shows cyclic effects (including non-echo secondary stresses). 

e. Stress shows rhythmic effects lexically/post-lexically (cf. the English 
‘rhythm rule’). 

f. Lexical stresses interact at the post-lexical level, e.g. compounding/phras- 
al stress. 

g. Lexical stress provides the designated terminal elements for the assign- 
ment of intonational tones (‘pitch-accents’). 

h. Other arguments that every syllable is in a metrical constituent which 
can be globally referenced. 


The properties in (10) all apply in the case of English, but as Hyman (2014) points 
out, other languages “care” far less about word stress than English. Spanish could 
be said to be less interested in word stress at least with regards to properties (10) 
b, c, and probably e. Thus it is also quite conceivable that a language might rank 
very low on these dimensions, with a stress system that serves only very few of 
these functions and leaves barely any trace, so to speak. Some languages have also 
been argued to have a stratified lexicon, with one set of lexical items exhibiting 
one configuration of properties, and a second set another (Hyman 2006: 228). We 
will return to a discussion about languages without stress in section 3.5.3. How 
the position of word stress is determined in different languages is subject to a vast 
research body (cf. van der Hulst 1999b; Goedemans et al. 2010 for surveys). Some 
stress systems seem very simple in their regularity, e.g. stress always occurs on 
the initial syllable of a word in Hungarian (if its expression is not blocked by pos- 
tlexical processes, Siptar & Torkenczy 2007: 21-23). Other languages require much 
more complex metrical algorithms that can differ at least in whether they are 
quantity-sensitive (sensitive to syllable weight, e.g. as expressed in moras), whether 
feet in it are left- or right-headed (trochaic or iambic), whether stress is oriented 
towards the left or the right edge of the word, and whether they systematically 
ignore constituents at one of the edges (extrametricality, cf. Hayes 1995: especially 
54-61, 71-74). Together with additional morphophonological constraints e.g. on 
how certain classes of affixes interact with stress, this can then lead to systems that 
at first sight seem quite opaque, such as that of English, but they also allow for the 
exploitation of stress contrasts for minimal pairs (e.g. English óbject (noun) vs. 
objéct (verb), German Ténor “thrust of an idea" vs. Tenór “tenor voice", Spanish 
páso “step / pass.1SG" vs. pasó “pass.3SG.INDEF”), which is of course impossible 
when stress is always assigned to the same position in a word. There are also pro- 
posals that the primary word stress in many languages is simply specified in the 
lexicon, but interacting with the metrical structure (cf. van der Hulst 1999a: 73-75). 
In Spanish, word stress always falls on either ofthe three final syllables in a content 
word, unless it is followed by two clitic pronouns (e.g. cuéntamelo “tell it to me"), 
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but within this three-syllable window, the difference between stress on the antepe- 
nult (proparoxytonic), on the penult (paroxytonic) or on the ultima/final syllable 
(oxytonic) can encode lexical and grammatical contrasts, as seen above. Among 
these three options, paroxytonic words are by far the most common overall (almost 
7596), oxytonic words are only more common in the subgroup of words ending in a 
consonant other than /s/, and proparoxytonic words make up just a little less than 
696 of a corpus of the 4289 most frequent Spanish words, according to Eddington 
(2000: 96). Stress assignment in Spanish has been analyzed algorithmically (Harris 
1983, 1987; Roca 1997, 1999, 2006; Lipski 1997, among many others) as well as via 
an exemplar-based lexical approach (Eddington 2000). 

Stress assignment can be subject to considerable variation within a language 
(cf. e.g Behnstedt & Woidich 1985a, 1985b on substantial regular differences in 
regional varieties of Egyptian Arabic). Lexical items with low use frequency seem 
often to show more variability in their stress placement, which would indicate that 
stress position is at least partly lexicalized, but the relationship between the two 
seems to be quite indirect in many cases, at least in English (cf. Tokar 2017: 19, 
21-22). Since stress derivation is not the focus of this study, we will simply treat 
stress position as a given property of prosodic words in Spanish, part of the knowl- 
edge of speakers, and also mostly ignore the level of feet in representations hence- 
forth. For now, we resolve to call the culminative, obligatory highest prominence 
assigned by metrical structure at the level of the prosodic word *word stress" or 
“lexical stress", without necessarily implying the concomitant bundle of phonologi- 
cal and phonetic properties attached to stress in many European languages. *Stress 
accent", on the other hand, will be used when necessary to distinguish it from other 
kinds of lexical accent like the Japanese one. For the treatment of intonation, word 
stress is crucial in many languages: the position specified by it provides the other 
anchoring site, besides constituent edges, for tones in AM. Those tones assigned 
to positions designated as prominent by the metrical structure are called pitch 
accents. Before returning to the discussion about how intonational contours are 
related to prosodic landmarks crosslinguistically and in AM, we will look at what is 
known about metrical structure and word stress in Quechua. 


3.3.3 What is (not) known about stress in Quechua 


For Quechua, the facts about word stress are far less clear than in many European 
languages and making efforts at resolving them is one of the contributions of the 
present work. In overviews based almost entirely on impressionistic data, word 
stress is said to regularly fall on the penult, or on the initial syllable, in some of the 
Quechua I varieties (Cerrón-Palomino 1987: 128-129; Adelaar & Muysken 2004: 207— 


3.3 Intonation in the autosegmental-metrical model — 33 


208; Wetzels & Meira 2010: 352-353). Focusing on those central Peruvian Quechua1I 
varieties closest to Huari Quechua, for the Quechua varieties spoken in the depart- 
ment of Ancash, Parker (1976: 57-60) describes a complex, partly weight-sensitive 
system: broadly speaking, stress (which he equates with intensidad “intensity”) reg- 
ularly falls on the penult, as in other varieties. In exclamatives, it seems to be final, 
but he tentatively proposes that this is due to a particular exclamative intonation 
(cf. also Cusihuamán 2001: 79-81, who similary describes the difference between 
declaratives and exlamatives in terms of intonation for Cuzco Quechua). In par- 
ticular for the varieties spoken in the Callejón de Huaylas, the following details are 
given: in phrase-final words, a non-final heavy syllable ((C)VC or (C)V:) receives 
stress; or the initial syllable, if no heavy syllable is present. In words in a non-fi- 
nal position in the phrase, the initial syllable is stressed, or, in slow speech, one of 
the heavy syllables, if present, preferentially the prefinal one (Parker 1976: 58-59). 
Parker (1976: 59-60) also explicitly remarks on the inadequacy of applying notions 
of stress from Spanish to the Quechua he describes, because he observes a disjunc- 
tion of phonetics cues: in a word consisting of three light ((C)V) syllables in final 
position in the phrase, the highest intensity is heard initially, but the highest pitch 
on the penult. A particularly exceptional case is also described: syllable structure 
does not allow closed syllables with a long vowel ((C)V:C) in this variety; when they 
occur morphologically, the vowel is shortened. For some exceptional lexical items, 
they do however surface word-finally, and Parker (1976: 56) states that they then 
also cause irregular oxytonic stress. For Conchucos Quechua, Stewart (1984) paints 
a picture similarly full of complex exceptions and optionality. In general, she also 
describes word stress to fall on the penult. Unlike Parker, she treats only syllables 
containing a long vowel as heavy, at least in all words that do not consist of three 
syllables (Stewart 1984: 195). Quantity-sensitivity acts variably, and differently on 
words of different length: in words of more than three syllables, a final heavy sylla- 
ble following a light penult can but need not attract stress, but in bisyllabic ones, it 
normally does not (Stewart 1984: 199, 201—203). No explanation for this optionality 
is offered. For trisyllabic words, the situation becomes more complicated. Only in 
them, closed syllables ((C)VC) also count as heavy. They attract stress in a positional 
hierarchy: if the penult is heavy, it is stressed. If it is not, then the initial syllable 
is stressed if heavy. If that is also not the case, then the final syllable may receive 
stress if heavy (Stewart 1984: 205-206). In all words, when both final syllables are 
heavy, then the penult is stressed. Words where all three syllables are light, such as 
yaku-ta (water-OBJ) “water (obj.)” or hacha-man (tree-DIR) “towards the tree", are 
variably stressed either on the initial syllable or the penult (Stewart 1984: 198—199). 
This last case sounds similar to what Parker describes for such trisyllabic words 
(see above), but unlike Parker, Stewart does not differentiate between intensity and 
pitch. It is imaginable that some of the unexplained optionality, based presumably 
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on what she as analyzing linguist perceives as prominent, might be resolved when 
considering phonetic cues individually. 

Unlike the previous studies, Hintz (2006) uses instrumental acoustic data in 
her study on stress in South Conchucos Quechua. She does not find evidence for 
weight-sensitivity, even though the varieties studied by her and Stewart (1984) 
overlap.”* Based on data from four speakers, she finds that stress is word-initial in 
spontaneous connected speech, with a secondary stress on the penult. In isolated 
words and under conditions of *emphasis", the situation reverses, and primary 
stress is now located on the penult, with secondary stress on the initial syllable. Her 
perceptions are partially confirmed by judgments by three of the same speakers, 
who were asked to identify prominent positions on a subset of the recorded words. 
Both in her own judgment and that of the native speakers, variation occurred in 
general, but in particular in questions and exclamatives, where stress was found 
to have moved to the final syllable of the word (Hintz 2006: 488). Identical word 
forms containing the suffix -shun (1.PL.INCL.FUT), such as aywakushun “we will 
go/let's go" were found to have different stress patterns depending on whether the 
Speaker intended to express agreement or a request (Hintz 2006: 490). Vowels in 
a voiceless environment are occasionally observed to be voiceless, with a large 
majority of them (>80%) occurring word-finally (Hintz 2006: 498, cf. Delforge 2011 
for a similar phenomenon in Cuzco Quechua). To explain part ofthe remaining var- 
iation in stress perception, Hintz (2006: 499-500) proposes to view such devoiced 
syllables as “optionally extrametrical", i.e. optionally not entering into the stress 
assignment algorithm, giving a number of suffixes where she has found this to 
be the case. Interestingly, Stewart (1984: 209) also describes extrametricality, but 
directly relates it to certain morphemes, which do not enter into the stress assign- 
ment algorithm, again explicitly optionally. The list of candidate syllables men- 


24 "South Conchucos Quechua" is a label for Quechua I varieties spoken in the Southern part of 
the Conchucos valley, including Huari province and town, by about 250.000 speakers according to 
Hintz (2006: 478) Her speakers are from Huaripampa, a small community outside of the town of 
San Marcos, about an hour's drive by car away from the town of Huari. “Conchucos Quechua”, the 
label used by Stewart (1984), seems to be more comprehensive, but she does not indicate where 
precisely her data is from. In Stewart (1987: 5-8), she explains however that *Conchucos Quechua" 
is spoken in an area bounded to the north by the town of Pomabamba, and to the south by that of 
San Marcos, by about 200.000 speakers, which is largely coterminous to the area described by Hintz 
(2006) to fall under the label “South Conchucos Quechua”. A large part of the data in Stewart (1987) 
comes from a community close to Pomabamba, which is a car drive of 3-5 hours away from the 
town of Huari. If the description in Stewart (1987) also applies to Stewart (1984), then her data is 
thus from the northern edge of the region in which “South Conchucos Quechua” is spoken, Hintz’ 
data is from its southern edge, and the data on which the present study is based is from a more 
centrally southern part of it. It is not known how much prosodic variation exists within this area. 
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tioned by the two authors mostly does not overlap, only —ta (OBJ) and -qa (TOP) are 
mentioned by both. Primary, secondary and unstressed syllables were found to be 
significantly different according to the cues of FO, intensity, and duration in Hintz' 
analysis. However, only for one speaker was there actually a difference between 
all three conditions, via pitch height; both pitch height and intensity were differ- 
ent in the data of all speakers between stressed and unstressed syllables; duration 
was the least reliable cue (Hintz 2006: 505—506). This last observation is somewhat 
expected: as vowel length is phonemic in Quechua I varieties, it is perhaps less 
likely to be used additionally as a cue to stress. 

As already pointed out by Parker (1976) and also by Hintz (2006), many of these 
observations would benefit from a treatment that separates phonetic cues, intona- 
tional phenomena, and word stress. In our own Huari Quechua data, we found a 
situation where our non-native perceptions were very heterogeneous. In multiple 
recordings of the same lexical items, impressions of highest prominence did not 
agree between recordings and hearers; in many words, prominence could be heard 
on different syllables depending on which cue we focused on. We therefore decided 
to refrain from relying too much on our own perceptions which are apparently 
biased from exposure to languages with other prosodic systems. In a study based 
on the speech of 2 speakers, Buchholz & Reich (2018) investigated whether acoustic 
cues (pitch height, pitch range, duration and intensity), both individually and taken 
together, served to make syllable positions stand out from their environment. We 
did not find consistent evidence across words of different length that either the 
penult or the initial syllable in the word was cued to stand out relative to the others, 
except for an indication that the penult seemed to have slightly higher intensity 
values overall (Buchholz & Reich 2018: 147, 155). CVC-syllables in the word penult 
also were found to be somewhat lengthened in this position, but not CV-syllables in 
the same position (Buchholz & Reich 2018: 153-155). On the other hand, we found 
that pitch height formed a distinctive pattern on phrases, operationalized as mate- 
rial between pauses, such that in general a gradual rise from the beginning of the 
phrase was observed, until a more abrupt fall taking place across the last two sylla- 
bles of the phrase (Buchholz & Reich 2018: 151-153). 

There are several indications that what has been described as word stress 
might partially consist of postlexical intonational phenomena. First, from a func- 
tional perspective it seems inefficient to have a system of stress assignment that 
is as complicated as described above yet does not fulfill a distinctive function: 
all descriptions agree that stress in Quechua is not lexically distinctive. Secondly, 
Parker (1976) and Cusihuamán (2001) state that the *stress shift" in exclamatives 
is likely an intonational phenomenon; the same argument could also be made for 
the "stress shifts" due to pragmatic conditions observed by Hintz (2006), namely 
that these are possibly due to different pitch accent and boundary tone configura- 
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tions at a phrasal level. Boundary tones adjacent to a phrase edge might also better 
explain why voiceless syllables do not seem prominent, because such syllables are 
insufficient landing sites for tones. Thirdly, Stewart (1984: 207—209) gives evidence 
that the domain of stress assignment is not isomorphic with the morphological 
word (cf. (11)). 


(11) Conchucos Quechua data points that necessitate a domain of stress assignment 
larger or smaller than the morphological word, according to Stewart (1984: 
208) 

a. áchikày mikükuskínàa 
achikay miku-ku-ski-naa 
Name ` eat-MID-ITER-PST 
*The Achikay [wicked old woman] ate it all up" 

b. hákàakunáta 
hakaa-kuna-ta 
guinea.pig-PL-OBJ 
“the guinea pigs (obj.)" 

c. kéedanantsíkpaqkáqta 
keeda-na-ntsik-paq ka-q-ta 
stay-NMLZ-1.INCL-BEN COP-AG-OB] 
“that which is to stay for us” 

d. tarintsikpistsu 
tari-ntsik-pis-tsu 
find-1.INCL-ADD-NEG 
“we don’t find it at all” 


All the examples in (11), where the acute accent () marks primary and the grave 
accent (`) secondary stress, are attested but incompatible, according to Stewart 
(1984: 208), with her stress algorithm unless the domain of stress assignment is 
not the individual (morphological) word. In (11)a, the words achikay, the name of 
a wicked woman from folklore, and mikukuskinaa “s/he ate”, are both full content 
words, yet if they were assigned stress independently according to Stewart’s system, 
then the initial syllable of mikukuskinaa would have to have primary stress. In con- 
trast, (11)b-d lead Stewart (1984: 208) to conclude that the morphemes —kuna (noun 
plural), ka- (copula)?5 and “frequently” -pis (additive meaning, “also” or “even”) 


25 Stewart (1984) treats kaq as a “definitivizer” and apparently as a dependent morpheme. As 
indicated in the glosses, I treat it as a combination of the copula verb ka- and the agentive nom- 
inalization —q, and as also morphologically independent because it can occur on its own (which 
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form their own domains of stress assignment. If we take a postlexical perspective, 
both types of cases can be seen as indicative of correlates of phrasing or postlexical 
pitch accenting rather than of lexical stress. 

Finally, formulations such as that "stress can be distributed over several long 
syllables" (Adelaar & Muysken 2004: 207) also indicate that what has been described 
as stress in much of the impressionistically based literature on Quechua is probably 
to be understood as a broad label collecting a number of suprasegmental phenom- 
ena, rather than culminative word stress as defined in the previous section? The 
results in Buchholz & Reich (2018) about an identifiable pitch trend (different from 
declination) across a phrase-like unit can also be seen as indication that at least 
pitch movement is sensitive less to stress as a word-level phenomenon, but to a unit 
above the prosodic word. 

Hyman (2006: 246—247) cautions against interpreting any prosodic differences 
observed in an unfamiliar language via the lens of stress accent familiar to the 
analyzing linguist from European languages. Citing the example of Indonesian, he 
proposes the heuristic that “if word-stress is so hard to find, perhaps it is not there 
at all". Indonesian has received a number of conflicting stress-based accounts, yet 
van Zanten et al. (2003); van Zanten & Goedemans (2009) showed that what these 
accounts had taken to be correlates of word stress seems most likely to be pitch 
accenting independent of word stress, instead seeking proximity to a phrasal edge. 
Gordon (2014: 111-112) even estimates that perhaps a majority of word stress- 
based accounts especially of lesser-studied languages actually reflect such systems 
of phrasal, not lexical, prominence realized via pitch events that seek to occur 
close to phrasal edges. In this vein, as an alternative hypothesis to the complex and 
optionality-heavy stress-based accounts reviewed here, I will consider the possibil- 
ity that Huari Quechua has no word stress, or at least only *cares" very little about 
it, following the characterization by Hyman (2014). In chapter 6, I will present data 
in evidence for this hypothesis and also provide an analysis of the intonational 
phenomena of Huari Quechua that only marginally needs to make reference to a 
lexical stress position. 


-kuna and -pis cannot). For an in-depth treatment of the functions of ka-q and its derived forms, 
see Bendezü Araujo (2021). 

26 See also Wetzels & Meira (2010: 314—315), who state that the definitions for phenomena like 
stress or pitch accent in descriptions of suprasegmental phenomena in South American indigenous 
languages in general are often vague. They make it clear that much more research is needed before 
reliable generalizations can be made. 
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3.3.4 Pitch accents and arriving at phonological tones from pitch contours 


Returning to the discussion of how to relate the pitch contour and the text to the 
tones along the example of the rise-fall-rise contour, we can now see that the posi- 
tion of the first rise in (8)a-c is clearly linked to the stressed syllable in the words 
Bob, heir, and golfing, respectively. Specifically, it is linked to the stressed syllable of 
that word which is most prominent in the entire IP. In some languages, among them 
English and German, it is only the most prominent word in a phrase whose stressed 
syllable has to be pitch accented. Because of this link to a prominent position, such 
pitch accents are sometimes called *prominence-lending" (e.g. Welby 2006: 364). 
They have also sometimes been confused with a direct correlate of stress, but this 
should be kept apart: syllables that are not pitch accented but stressed are still often 
longer and louder than unstressed syllables. In languages with stress, a stressed 
syllable is a necessary condition for prominence-lending pitch accents to occur, but 
not a sufficient one. In other languages, this does not have to be the case. Continu- 
ing with our example, we can see that it is the strongest position in the phrase that 
receives the pitch accent because in he's golfing tomorrow, it could also occur on 
another word than where it does in (8)c, e.g. on tomorrow (see (12)). 


(12) Metrical structure and rise-fall-rise contour on he's golfing tomorrow with 
highest prominence on tomorrow 


x PhP/ip/ b 
X x prosodic word (w) 
x ` x foot (F) 
Ka °K) o 00 ow X syllable (a) 


He's golfing tomorrow 


L*H L-H% 


That would, however, indicate another context:?" e.g. one in which the parliamen- 
tary debate is still taking place the next day, but the speaker has just asked when the 


27 Note that in principle, the same thing could be done with 8b, e.g. moving the highest promi- 
nence and also the pitch accent to the stressed syllable on throne. However, that would imply a 
context in which long-lost heirs to different things are contrasted with each other. Stereotypically, 
long-lost heirs are often those to thrones (*throne" is by far the most frequent collocate to “heir to” 
in a four-word window to the right of *heir" in the >13 billion-word web corpus English Web 2015 
as searched via Sketch Engine (Jakubíček et al. 2013; Kilgarriff et al. 2004)), so such a context is 
simply not highly likely from our knowledge of the world. 
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president will be going golfing next and has been told it will be tomorrow. In this 
context, (8)c would clearly be odder than (12). However, the other way around, (12) 
in the context given for (8)c would arguably be more acceptable, because there is 
a general preference for locating the highest prominence rightmost (cf. Ladd 2008: 
252). This hints at the complex influence that metrical structure, pitch accenting 
and context have on the interpretative categories of focus and contrast, which 
will be treated in more detail in section 3.7. Note that the “incredulity” aspect of 
the meaning conveyed (and made plausible by the context) does not change, just 
the location at which alternatives to (8)c and (12) would have to differ from them, 
respectively, in order to be less incompatible with the speaker's expectations. That 
it can change, however, is evidence that it does not suffice to simply say that “the 
position of the first rise is linked to the stressed syllable” in a word, as we just did 
above. Instead, the linking relationship has to be established between the individ- 
ual tones making up the rise, the L and the H. If the rise is timed so that the pitch 
trough preceding it (caused by the presence of the first L tone) extends throughout 
the stressed syllable of the most prominent word, then the “incredulity” reading 
obtains, while if most of the rise, sometimes including its peak, takes place on the 
stressed syllable, then a different reading obtains which is also contrastive but 
without the additional meaning of “incredulity”, cf. (a) and b) in Figure 4. 

That this difference in relative timing between text and tune creates a differ- 
ence in perception that for the majority of speakers seems to be close to categori- 
cal?? was first established by Pierrehumbert & Steele (1989) in a categorical percep- 
tion and imitation task. Providing contexts that make one or the other realization 
more felicitous (Pierrehumbert & Steele 1989: 185) and thus demonstrating that 


28 Note that of their five test subjects, one did not reproduce any difference in contours between 
the two conditions on average. Pierrehumbert & Steele (1989: 190) readily ascribe this to the L*H 
pitch accent not occurring in the speaker's tonal inventory. It's since been amply demonstrated that 
intonation can vary considerably also between experimental tasks, speaking styles and various so- 
cial categories, even within speakers from a locally relatively restricted area, and also show effects 
of interlocutor accomodation (cf. Face 2003; Henriksen 2013; Romera & Elordieta 2013; Hutten- 
lauch et al. 2016; Huttenlauch et al. 2018; Martín Butraguefio & Mendoza 2018; Vanrell & Fernández 
Soriano 2018, to cite only a few recent works on Spanish). Furthermore, categorical perception 
experiments have been less successful in other instances (see also Gerrits & Schouten 2004 for a 
critical assessment of the paradigm). Overall the conclusion seems to solidify that for intonation, 
the relation between form and meaning is often distributional and probabilistic, rather than strict- 
ly categorical: realizations of two meaning categories often show a bimodal yet also clearly over- 
lapping distribution (meaning that statistically, differences with clear trends do emerge, but two 
randomly chosen instances from each of the categories are relatively likely to be similar or even to 
counteract the trend); they can also differ substantially in their internal homogeneity, all of which 
seems to firmly locate variability at the heart of what intonation is (cf. Ladd 2008: 150—154, 2014; 
Cangemi & Grice 2016; Roettger 2017). 
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Only a millionaire Only a mi Il ionaire 


Figure 4: Two pitch track contours of only a millionaire, with the “incredulity” reading (a, L*H L- H%) 
and with a contrastive reading without “incredulity” (b, LH* L- H%), adapted?” from Pierrehumbert & 
Steele (1989: 182). 


the contours in Figure 4 can be intonational minimal pairs, they propose to encode 
their difference phonologically via specifying which of the two tones in the first 
rise is linked to the stressed syllable in millionaire. This link between the TBU of 
the stressed syllable and one of the tones of the pitch accent is called association, 
and it is marked in the phonological annotation by the asterisk after that tone (see 
Figure 4). Note that even in the contour in Figure 4 b), where the association holds 
between the stressed syllable and the H tone (LH*), the pitch peak only occurs quite 
late in [mil]. Association of a starred tone with a stressed syllable does not necessar- 
ily mean that the pitch peak always occurs within that syllable, but that the timing 
of its tonal event can be reliably defined relative to its position (Arvaniti et al. 2000, 
2006; Ladd 2006, 2008). How the other types of tones in the contour are related to 
the segmental string will be treated in section 3.5. It is a reasonably good initial 
heuristic in AM-based intonation analysis to take pitch turning points (peaks, i.e. 
local maxima, and shoulders, points which form the left or right edges of high pitch 
plateaux, for high tones; troughs, local minima, and elbows, points which form the 
left or right edges of low stretches of pitch, for low tones) as indication for tonal 
targets. Yet there are many factors that impede this simple equation (cf. Ladd 2008: 
134-138; Barnes et al. 2010). For one, it is well known that voiceless obstruents 
tend to locally raise F0 while voiced obstruents will lower it (so-called consonantal 
microprosody even though its effects can be quite large, cf. Ohala 1978; van Santen 
& Hirschberg 1994; Hanson 2009; Kirby & Ladd 2016; Ladd & Schmid 2018). These 
have to be discounted when trying to deduce the presence of tones from the pitch 
signal. Then, as we have seen, quite subtle but inarguable pragmatic contrasts can 
affect pitch in also quite subtle but perceptible ways. Phonological context can also 


29 Itis problematic to obtain open access reproduction rights for pitch tracks originally published 
under more restrictive copyright schemes. I therefore use schematized adaptations throughout in 
these cases. 
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play an important role, e.g. when the proximity of other tones or prosodic bound- 
aries causes tonal targets to undershoot (truncation, which can vary between lan- 
guages and prosodic positions, cf. Rathcke 2013, 2016; Cho & Flemming 2015). Thus, 
inferring tones from the pitch signal can only be done with some certainty once 
the relevant landmarks in the prosodic structure and the tonal inventory of the 
language are known. Both the relevant landmarks as well as the tonal inventory 
can vary considerably between languages. 


3.4 Intonation systems typologically 


In the following, we will explore how languages vary in their intonation systems. 
The goal is to arrive at a variation space mapped out by what is known about pro- 
sodic typology. This will help us avoid a treatment of Huari Quechua and Huari 
Spanish only in terms of what is known about other Spanish varieties or other 
European languages. 


3.4.1 Domains of tone assignment 


The very broad typological division of languages into “tone” and “intonation” 
languages” is partially based on the domain of tone assignment. Tone languages 
assign tones at the level of the syllable or the prosodic word, but more crucially, dif- 
ferent combinations of segmental strings together with tonal contours can encode 
distinctive lexical meanings in them. In intonation languages, with which we are 
here mostly concerned, tones never carry lexical meaning; instead they can convey 
meanings that are orthogonal to the lexical meaning. In this sense, they are always 
postlexical in these languages. However, while this distinction is useful in itself, it 
would be misleading to characterize languages as only belonging to either cate- 
gory. See e.g. the studies in Downing & Rialland (2017) on how intonation interacts 
with tone in African tone languages. Languages that are not tonal according to this 


30 This division is used here only descriptively. The prosodies of many languages pose problems 
for this dichotomic typology, in particular the so-called “lexical pitch accent languages” among 
which Japanese, Basque and Turkish have sometimes been counted; but also many African tone 
languages, or languages like Egyptian Arabic or West Greenlandic, which assign tones to a unit 
that looks like a word but without conveying lexical meaning. Hyman (2014) proposes a prosodic 
typology that does not “pigeon-hole” languages, but instead allows to class them along several di- 
mensions via a number of criteria that do not necessarily have to cluster. See also Hyman (2009, 
2014, 2018); Beckman & Venditti (2010) for extensive discussion. 
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definition (and their descriptions) are also sometimes described as assigning tones 
also at the level of the prosodic word (Egyptian Arabic according to Hellmuth 2005, 
2007; West Greenlandic according to Arnhold 2014). It is not entirely clear, however, 
whether the unit identified as the prosodic word in these descriptions is really dis- 
tinct from a small phrasal domain such as the AP or PhP?! The tonal movements 
resulting from the placing of (combinations of) tones at prosodic constituent edges 
help to identify these constituents. However, this identification is not just such a 
straightforward matter. In the following section we will briefly consider how pitch 
cues can signal prosodic constituency beyond a delimitative and culminative place- 
ment of phonological tones. 


3.4.2 Tonal scaling and other gradual cues for prosodic constituency 


Besides marking the edges of prosodic units with tones, thus delimiting these units 
and making the domain they belong to identifiable, languages also use further 
tonal cues for the delimitation of prosodic units. They are related to tonal scaling, 
the height of a tonal event in relation to another or a reference point. Each human 
Speaker has a typical vocal range for speech, the interval on the acoustic spec- 
trum on which they are most comfortable speaking and at which they have the 
greatest control over pitch modulations, differing according to gender and indi- 
vidual. Within this range, pitch register, the global height of the pitch of an utter- 
ance, can still differ according to social context or emotional state of the speaker 
(Ladd 2008). Pitch registers can differ either in pitch level, if both the lowest and 
the highest points of a contour are lowered or heightened together, or in pitch 
span, if the lowest and highest points are moved closer towards or further away 
from each other. Within a given register, speakers have a lower and an upper base- 
line, in relation to which individual tonal events are scaled; this local scaling is 
called pitch excursion. The baselines normally fall gradually over the course of an 
utterance, this is called declination and likely due to decreasing subglottal pres- 
sure over an exhalation (Pompino-Marschall 2009: 246-247). It has been shown 
for some languages (Pierrehumbert 1980; Liberman & Pierrehumbert 1984 for 
American English, Prieto et al. 1996 for Mexican Spanish) that at the very end of 


31 In particular, Arnhold (2014: 221-222) argues for the prosodic word as the domain of tone as- 
signment mainly on the grounds of it being identical to a morphological or syntactic word. Howev- 
er, since individual words can also be realized without tones (Arnhold 2014: 222), the tones could 
also be analysed as belonging to a phrasal category that is usually, but not always, isomorphic with 
a morphosyntactic word (which can also be quite long in West Greenlandic). Solving this argument 
will most likely always depend on one's theoretical stakes. 
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some utterances, pitch events take place at a lower local level than projected from 
the declining lower baseline; this has been called final lowering and is one of the 
ways in which pitch scaling can be used to cue the edges of prosodic units. Pitch 
reset, the return of the pitch level to the initial baseline, is the second. Pitch reset 
often occurs between larger chunks of speech and is proposed by Himmelmann 
et al. (2018: 239-240) to be a universal cue, together with pauses, for what they call 
the *phonetic IP", a universally present speech unit which serves as the basis for 
the language-specific development of *phonological IPs". The third is systematic 
manipulation of the pitch level according to prosodic constituency. We will return 
to this last issue in section 3.6. The suggestion by Himmelmann et al. (2018) that 
there is both a “phonetic” and a “phonological” IP in some languages is targeted at 
the open question whether the different types of prosodic cues, such as segmental 
processes, phonetic cues such as initial strengthening and final lengthening, tonal 
cues based on the location of phonological tones, and tonal cues based on pitch 
scaling and pauses, actually align in signaling the same prosodic units. This is an 
ongoing discussion. It is evident that the consequent assignment of segmental pho- 
nological processes to prosodic domains has led to a proliferation of them. Prosodic 
domains also seem to differ in the way they manifest: e.g., unlike other prosodic 
domains, there is quite robust articulatory evidence for the syllable even though it 
is perhaps not the domain of phonological processes in all languages (Hyman 2015). 
Pauses, sometimes seen as the quintessential signal that a part of speech has ended, 
have been shown to also occur in hesitations or as indicators of increased cognitive 
load, instead of just at the end of larger prosodic groups and not even always there 
(Cruttenden 1997: 30—32; Frota et al. 2007; Seifart et al. 2018). Presumably, such 
points will all have to be considered in order to eventually resolve the issue of 
how many prosodic hierarchies there are and how they are related. However, since 
solving this problem is far beyond the scope of this work, I will here follow Frota 
(2012: 257—258) in her assessment that there is good evidence for a convergence at 
least between the segmental processes and the tonal cues to prosodic structure, as 
well as pauses, at least on average. I will take this as an optimistic methodological 
heuristic, undergoing constant re-evaluation. It is also quite evident that languages 
differ in the specific weight they assign these various cues. In the following section 
we will leave these murky waters and consider the differing types of tonal inven- 
tories across languages. 


3.4.3 Tonal inventories 


Research based on the AM model, by now the dominant approach, has produced 
intonational analyses of a considerable number of languages and varieties: see Jun 
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(2005c, 2014a) and the works therein for a typologically broad overview, Frota & 
Prieto (2015) and the works therein for an overview over intonational variation in 
Romance languages, and Prieto & Roseano (2009-2013, 2010) as well as the works 
therein and in particular Sosa (1999, 2003); Beckman et al. (2002); Estebas Vilap- 
lana & Prieto (2008); Hualde & Prieto (2015) on intonation in Spanish varieties. Jun 
(2005a) classes languages into whether they assign tones to prominent positions 
and edges of prosodic domains, or to edges only, and to my knowledge it still holds 
that no language has been described as not assigning tones to the edge(s) of at least 
one prosodic domain. However, languages differ considerably with regards to the 
paradigmatic tonal inventory that they employ according to available descrip- 
tions. AM only assumes two underlying tones, a high (H) and a low (L) one, but 
for many languages it has been argued that bi- or even tritonal (e.g. Gabriel et al. 
2010 for Argentinian and García 2016 for Peruvian Amazonian Spanish) complexes 
are assigned to a single position under certain conditions. As an aid for facilitat- 
ing comparison between data, a transcription system has been developed based on 
the AM description of American English by Pierrehumbert (1980), called Tones and 
Break Indices (ToBI, Beckman & Hirschberg 1994; Beckman et al. 2005). ToBI con- 
ventions for a wide range of languages have been developed that all use a related 
set of symbols. An asterisk following a tone (e.g. H*, L*, a *starred" tone) signifies 
that the tone is a pitch accent and associated with a tone-bearing unit (TBU) in a 
prominent position (a stressed or accented syllable). TBUs are usually taken to be 
either the rhymes of syllables, or moras, depending on the language (cf. Gussen- 
hoven 2004, 2018). Languages sometimes make more fine-grained differentiations 
than that, e.g. only allowing consonants of sufficient sonority as TBUs (cf. the situa- 
tion in Tashlhiyt Berber according to Roettger 2017). In bi-or tritonal pitch accents 
the tones are either connected with a plus sign (*) or simply written together (the 
convention followed here), with the tone associated with the prominent position 
marked by the asterisk (e.g. LH*/L+H*, L*H/L*+H, the tones not followed by * are 
also called *unstarred"). Unstarred tones preceding the starred tone in a more than 
monotonal pitch accent are called leading tones, those following are called trail- 
ing tones. Boundary tones of the smaller phrasal level (ip or AP) are marked by 
a minus sign (e.g. H-, L-), those of the higher phrasal level (IP) with a percent sign 
(e.g. H%, L96). Placing the minus or the percent sign after the tone symbol indi- 
cates a boundary tone at the right edge, placing it before it one at the left edge 
of a phrase (e.g. -H, 96H vs. H-, H%). The specific status of starred and unstarred 
tones in pitch accents as well as boundary tones and their phonetics and phonol- 
ogy will be discussed below, in section 3.5. There are further symbol conventions 
used in some language-specific ToBI schemes, the most frequent of which are ! for 
downstep and j for upstep (e.g. !H, jH), intended to mark a tone whose excursion is 
markedly lower (downstepped) or higher (upstepped) relative to its surroundings. 
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SpToBI, the ToBI system developed for Spanish (Beckman et al. 2002; Face & Prieto 
2007; Estebas Vilaplana & Prieto 2008; Aguilar et al. 2009; Hualde & Prieto 2015) also 
specifically marks delayed rising peaks, i.e. pitch accents occurring in prenuclear 
or prefinal position in which the peak is realized after the stressed syllable but 
the H tone is taken to associated with it, using the “smaller than” symbol between 
the L and the H tone (L«H* / L+<H*). Whether tonal scaling (down/upstep) is best 
encoded paradigmatically at the level of individual tones in all cases and the inde- 
pendent status of the delayed rising pitch accent in Spanish are both somewhat 
contested issues. The former will be discussed in section 3.6, while we discuss the 
latter in the next section. 


3.4.4 Delayed peaks in Spanish and the notion of the nuclear accent 


Regarding the issue of rising pitch accents with a delayed (posttonic) peak, it is 
uncontroversial that in many peninsular varieties of Spanish, prenuclear/prefinal 
rises are generally*” delayed in the way described, but in nuclear / final position 
in the utterance, they are not (see Hualde 2002: 103 for a list, since then further 
enlarged, of works concurring with this finding). Other varieties, e.g. Cuzco 
Spanish, do not show this behaviour as strongly (O’Rourke 2005). In those that do, 
however, a pitch accent that is not delayed on a prefinal word causes the word to 
be interpreted as being narrowly focused; equally, oxytonic words (stressed on the 
final syllable) do not usually delay the peak into the following word (cf. Hualde 
2002: 103—105; Face & Prieto 2007). A rather neat unifying explanation is proposed 
by Hualde (2002: 106), based on findings by Nibert (1999) that intermediate phrase 
boundaries follow the non-delayed accents in prefinal position: the delayed reali- 
zation of the peak due to the H tone of the pitch accent is blocked by the presence 
of the boundary tone realized on the last syllable of the word, whether the word 
is in ip-final or IP-final position. A concomitant assumption is that focused con- 
stituents are always directly followed by the right edge of an ip or IP. This is the 
view espoused by Gabriel (2007), based on further empirical evidence and formu- 
lated in optimality-theoretic (OT) terms.?? We will also follow this view here and 


32 There are some curious exceptions: Prieto & Torreira (2007: 475) state that the prenuclear peak 
can regularly occur late in the accented syllable, instead of in the postaccentual one, on a first 
accent in a phrase *when the first accent belongs to an utterance-initial phrase which contains two 
accents". 

33 Even though he proposes that ip- or IP-boundaries follow focused constituents, Gabriel (2007: 
191) formulates doubts that it is really the presence of the ip boundary tone which causes the peak 
to be realized earlier, based on the observation that proparoxytones in narrow focus also realize 
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can then also define nuclear accents, with Ladd (2008: 134), as the final and only 
obligatory accent in the intermediate phrase. We already saw that in English or 
German, the nuclear accent, the pitch accent assigned to the strongest prominence 
in the phrase, is often the only one. This is mostly not the case in Spanish, where 
words in prenuclear position are also regularly pitch accented (Hualde & Prieto 
2015: 358). However, using the reasoning and evidence given above, we will take 
the nuclear accent in the Spanish case to be not only the rightmost one (which it is 
also in English, all following ones being deaccented, Pierrehumbert 1980: 37), but 
also on the final word in an ip.?* That is to say, an ip-boundary is inserted after the 
nuclear accent in Spanish if it is not already final in the IP. The special status of the 
nuclear accent is also corroborated by the fact that in many languages, fewer pitch 
accents are available in prenuclear position than in nuclear position (cf. Ladd 
2008: 286). This is reflected in the Spanish ToBI system, which only recognizes 
two different prenuclear pitch accents (L*H and L«H* (which we here take to be a 
variant of LH*)) in prenuclear position, but five in nuclear position (cf. Hualde & 
Prieto 2015). We will return to the issue of focus and nuclear accent further below 
(section 3.7.3). 


the peak within the stressed syllable. However, Prieto et al. (1995: 438—439) find that at least for 
one of their two Mexican Spanish speakers, the presence of a following phrase boundary clearly 
reduces peak delay also on proparoxytones, showing that tonal crowding effects can persist also 
at distances larger than single syllables. Their other speaker does not show this effect, highlighting 
the importance of considering individual variation in these discussions. 

34 This is not the same as saying that focus is always rightmost in a sentence or in an utterance 
in Spanish, the categorical claim by Zubizarreta (1998) which has since then been amply shown 
to be unsubstantiated (Face 2001; Hualde 2002; Gabriel 2007; Muntendam 2010; Hoot 2012, 2016; 
Uth 2014; Vanrell & Fernández Soriano 2018; Dufter & Gabriel 2016 and the works cited therein). 
The nuclear accent is here defined prosodically as a statement about pitch accent location and 
metrical structure in an intermediate phrase. In the conception espoused here, this only relates 
probabilistically to the interpretative category of focus (Calhoun 2010b). The optimality-theoretic 
constraints that relate focus to prominence and phrasing are also formulated as violable in Gabriel 
(2007: 235, 278). 

35 Asan example for a case where this does not hold as strongly, cf. Gussenhoven (2005: 121) who 
states that standard Dutch has a large choice of prenuclear pitch accents. However, in a sequence 
of several prenuclear accents within an IP, they tend to be of the same type, thus still largely con- 
forming to the assertion by Ladd (2008: 286) that the type of prenuclear accent within a single “tune 
is - or may be - a single linguistic choice". 
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3.4.5 Differing degrees of combinatorial freedom in the tonal make-up 
of prosodic constituents 


Intonational descriptions of languages also differ with regards to the (number of) 
phrasal domains at which tones are assigned and whether edge tones occur at the 
right or the left edge of a prosodic constituent. For example, German as analysed in 
Grice et al. (2005) has (monotonal) tones assigned to the right edges of intermediate 
phrases (iP; L-, H-, !H-) and to the left (%H) as well as right edges of intonational 
phrases (IP; L%, H%, ;H%, not all of which are attested in free combination with the 
ip tones), as well as six monotonal and bitonal pitch accents (L*, H*, LH*, L*H, HL*, 
H!H*) available for assignment at prominent positions. Unangan (Eastern Aleut), 
according to Taff (1999), makes do with a single monotonal pitch accent (H*), three 
monotonal (H-, L-, ;H-) and one bitonal (LH-) ip-boundary tones and two monotonal 
(L% and H%) IP-boundary tones; all of the boundary tones occur at the right edge 
of their respective phrases. Tokyo Japanese assigns two tones (%LH-) at the left and 
one tone (L%) at the right edge of the Accentual Phrase (AP), a single bitonal pitch 
accent (H*L) on lexically accented syllables and provides a choice between four 
(H%, LH%, HL%, HLH%) boundary tone combinations (mostly) at the right edge of 
the IP, according to Venditti (2005); Venditti et al. (2008); Igarashi (2015). 
*Peninsular" Spanish, in the description by Hualde & Prieto (2015), has seven 
monotonal and bitonal pitch accents (L*H, L<H* (these two only prenuclear), L*, 
H*, LH*, HL*, LjH*), three monotonal boundary tones at the right edge of the ip (L-, 
H-, !H-), and six monotonal and bitonal boundary tones at the right edge of the IP 
(L%, H%, !H%, LH%, L'H96, HL%). In the only available descriptions of the (declar- 
ative) intonation of a Quechua variety in AM terms, O'Rourke (2009) tentatively 
proposes that Cuzco Quechua assigns an -L tone at the left edge and optionally a 
H- or L- tone at the right edge of the ip, and has an inventory of two pitch accents, 
L*H, and LH*, the first of which occurs phrase-medially and the second phrase-fi- 
nally. O' Rourke (2005: 182—201), discussing interrogative intonation, does not add 
a further pitch accent. IP-finally, L% is attested most often, also in questions, but 
H% does occasionally occur. As stated before, the intonation of both Huari Quechua 
and Spanish is as yet unexplored. Thus the range of variation in tonal inventories is 
quite large. A further relevant aspect pertains to the combinatorial freedom of these 
tones in the different languages. While in Spanish and German, there are several 
options to choose from at each point in the prosodic structure that a tone could be 
assigned to, in Japanese, an AP will always begin with a rise, either because of the 
AP-initial %LH- tones or because of a combination of AP-initial L plus the high tone 
of the lexical pitch accent H*L if the first or second syllable in the AP is lexically 
accented, in which case the AP-initial H- is superseded by the lexical H (Venditti 
2005). In addition, the only pitch accent is H*L. That is to say, there is less combina- 
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torial freedom in the Japanese system compared to the Spanish or German one (cf. 
Igarashi 2015: 559—560). The Japanese case is comparable to other languages like 
French (Jun & Fougeron 2000, 2002) or Korean (Jun 1993, 2005b) that have also been 
described as having an AP with quite fixed initial and final boundary tones, and 
to the Cuzco Quechua case in the description by O'Rourke (2009). Quite obviously, 
less optionality at a given structural point means fewer possibilities for exploiting 
these options to encode meaning differences, but on the other hand, it means that 
the unit that is delimited by these tones (here, the AP) is made more easily recog- 
nizable, because its edges are signaled by a less variable set of cues. Paradigmatic 
variability at a given point in the prosodic structure therefore stands in a tradeoff 
relationship with the delimitation of larger prosodic units. 

From the point of view of information transmission, this is a tradeoff between 
having many paradigmatically variable cues at each point, each signaling some dif- 
ferential information, on the one hand, and having cues converge on only a few 
units that are signaled so that listeners may adjust their expectations to these units, 
and also recognize when deviations from this convergent pattern are exception- 
ally exploited for the coding of meaning, on the other. Redundant coding is part of 
what constitutes recognizable structure in languages, and it is especially necessary 
in the acoustic signal for human communication to be robust under noisy condi- 
tions (Shannon 1948). Paradigmatically variable prosodic cueing thus is capable 
of encoding more information at each structural point, but it runs a greater risk of 
information loss. By having a greater combinatorial freedom for tones, such cueing 
systems can also encode information syntagmatically at the level of the tones, but 
we will also see that it is less easy to encode information syntagmatically via the 
sequence of the larger units that are delimited by these tones because the make-up 
of the tonal cues that signal their edges is more varied. Paradigmatically fixed 
cueing, on the other hand, while being less free at each individual structural point, 
can invest more resources in ensuring lossless transmission and it is freer to signal 
information syntagmatically at the level of the units delimited by the tones, e.g. by 
varying the size of the few units that are robustly cued in relation to the morpho- 
syntactic units that are contained in them (phrasing). 


3.4.6 “Phrasing” and *Accenting" languages: An appropriate typology? 


In fact, Korean, Japanese and French have occasionally been called *phrasing" lan- 
guages (cf. Igarashi 2015, also called *edge-prominence languages", in the typology 
developed by Jun 2005d, 2014b), because they optimize the (accentual) phrase in 
this way and use phrasing to a large degree for the encoding of information struc- 
ture. Languages like English, German, or Spanish, on the other hand, have been 
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called *accenting" languages (“head-prominence languages" in Jun 2005d, 2014b), 
because information structure and other pragmatic meanings are mainly encoded 
via the placing of tonal accents and choice of tones. Venditti et al. (1996) show that 
indeed, very similar information structural effects are conveyed via (de-)phrasing 
in Japanese and Korean, on the one hand, and by (de-)accenting in English, on the 
other, which supports the point that these are divergent coding strategies but with 
similar purposes (cf. Ladd 2008: 279—280). While deaccentuation in “accenting” lan- 
guages means that pitch accents are not realized on constituents where they would 
otherwise be expected, often following a focus, dephrasing in Japanese or Korean 
signifies that AP-tones are not realized following a focus, so that the last AP in an 
IP is the one beginning with the focused constituent. This is also called “prosodic 
subordination” in Venditti et al. (2008), which is an apt term also for the phenom- 
enon observed in the “accenting” languages. They also note that in Japanese, “full” 
dephrasing, i.e. the total deletion of AP tones and lexical pitch accents does not 
always take place and that often instead, tonal movements on focused constitu- 
ents are scaled higher, followed by a substantial reduction in the local excursion 
of subsequent movements, with the difference to “full” dephrasing being gradual 
(Venditti et al. 2008: 484—485). 

The same is probably true for “accenting” languages, with a gradual contin- 
uum between *deaccentuation" and *compression" (Kügler & Féry 2017; Vanrell 
& Fernández Soriano 2018). In general, the dichotomic typology between *phras- 
ing" and *accenting" languages is probably misleading, because on the one hand, 
it depends to a certain degree on the descriptive approach chosen: French e.g. has 
also been described in a more *accenting" framework (Post 2000, 2002), and under 
closer inspection, individual languages seem to nearly always occupy an interme- 
diate position between the two theoretical endpoints (cf. Igarashi 2015: 561). It has 
also been argued that phrasing always plays a role for the encoding of information 
structural categories in nearly all languages, and that only in some, accenting is 
an additional optional means for it (Féry 2013), but Kügler & Calhoun (2020) point 
out that there are counterexamples to the universal validity of this claim. Further- 
more, the dimensions of freedom of combinatoriality and “phrasing”-“accenting” 
must clearly be kept apart: Egyptian Arabic realizes pitch accents on the stressed 
syllables of prosodic words, and is thus an “accenting” language. However, unlike 
in German or Spanish, there is no space for accent choice: the only available pitch 
accent is LH*. Additionally, this pitch accent usually occurs on each prosodic word, 
no matter its information structural status (Hellmuth 2005, 2007; Chahal & Hell- 
muth 2014). That means that it also has few options to encode pragmatic meaning 
via accent choice (cf. Igarashi 2015: 561—562), and it tonally optimizes a recurring 
prosodic unit, like the *phrasing" languages. Madrid Spanish shares some, but not 
all of these characteristics: it is usually described, unlike German or English, as 
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also realizing a pitch accent on each prosodic word, at least in prenuclear position 
(Hualde & Prieto 2015: 358), while postfocally, deaccentuation or compression does 
often take place (Hualde 2002; Face & Prieto 2007; Gabriel 2007; Torreira et al. 2014; 
Vanrell & Fernández Soriano 2018). In addition, as mentioned above, there are 
only two pitch accents available for words in prenuclear position, compared to the 
choice between five for the nuclear position (the last accent in a phrase, cf. Hualde 
& Prieto 2015). Thus Spanish seems to use tone distribution both to optimize words 
to a certain degree (by the recurrent realization of rising accents prenuclearly), 
while the placement of the nuclear accent in rightmost position in a phrase also 
serves as a delimiting cue for that phrasal category (cf. Kügler & Calhoun 2020 for 
a similar view), and it also maintains quite a large bit of combinatorial freedom 
through its tonal inventory. We can see some of the typological dimensions dis- 
cussed exemplified in the utterances from Tokyo Japanese, Egyptian Arabic, Lima 
Spanish and German (Figure 5, Figure 6, Figure 7, Figure 8). 


L% T 


— + 4 t — + + 
| 


hyti/nen roman kuma shite kyo'nen oranda-ni iia ‘afte 


| (!) went lo ROME last year tl) went to HOLLAND last yaar. | 
1 


Figure 5: Two declarative Tokyo Japanese utterances with dephrasing, kyo'nen ro'oma-ni/oranda-ni 
ikima'shita “(I) went to Rome/Holland last year", adapted from Venditti et al. (2008: 485). The accent 

() marks the location at which the fall from the lexical pitch accent takes place. Both utterances cue 
focus on the location, which in a) bears a lexical pitch accent (rooma “Rome”) but in b) doesn't (oranda 
*Holland"). Note that the AP-initial H- does not get expressed separately if the mora bearing the pitch 
accentis initial or peninitial in the AP. 


They all involve instances of *prosodic subordination" — dephrasing or deaccentu- 
ation — in which the highest prominence in the IP occurs on a prefinal constituent 
and this is cued, to differing degrees, by increased pitch scaling on that constituent, 
and tonal compression up to deletion on following ones. This asymmetry in scaling 
effects an interpretation of focus on the most prominent constituent, in accordance 
with the elicitation contexts in which the other elements were given but the element 
corresponding to the most prominent one in the utterances was asked for? Of all 


36 Cf. Venditti et al. (2008: 484—486) for elicitation circumstances of the Japanese examples, Féry 
& Kügler (2008: 683—684) for those of the German one and Hellmuth (2006: 270-273) for those of 
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/ 13 T eu OT ee ES 


F Y 
L+H* L- L+H* L+H* L+H* H-H% 
— a i 
bitit Sallim ju'naani bi-I-'leel 
mum she-is-learning Greek at-night 


Figure 6: A declarative realization of maama bititfallim junaani bi-I-leel “mum is learning Greek in the 
evening" by an Egyptian Arabic speaker, adapted from Chahal & Hellmuth (2014: 399). The accent () 

is placed before the stressed syllable. The context is intended to yield a contrastive interpretation on 
maama “mum”, with ju'naani “Greek” given, yet there is no deaccentuation, only some compression. 


the examples, it is obvious that the Egyptian Arabic (Figure 6) one shows the least 
effect of this context, with the contrastively focused maama clearly receiving the 
highest scaled pitch accent, but the following given ones still robustly accented with 
the only available LH* pitch accent. We can also see a gradient, rather than a cate- 
gorical difference between the Spanish and German examples: before the focused 
constituent dem Rammler in the German example (Figure 8), tonal movement is 
compressed but the constituent der Hammel is still pitch accented (the degree of 
pitch compression/scaling reduction here differs depending on whether the pre- 
focal element is given or not, Féry & Kügler 2008), while following it, the verb is 
completely deaccented. In the second Spanish example (b) in Figure 7), the two 
prefocal pitch accents on cuatro and policias are clearly not compressed, both in 
comparison to the prefocal constituent in the German example, and to the postfocal 
pitch accents in the first Spanish example (a) in Figure 7). Those are compressed, 
but at least the first two, on policias and arrestaron, are still undoubtedly present 
and not deaccented. Full deaccentuation most likely takes place on the final word 
sospechosos in the second Spanish utterance. Note also the presence of the low 


the Egyptian Arabic one. The Spanish examples are spoken by Raül Bendezü Araujo and recorded 
by the author. They were purposely recorded to elicit differing intonation contours dependent on 
question context on the same sentence. This is of course not a very natural procedure, since in nor- 
mal conversation, (null) pronominalization and elision of given constituents would likely result in 
different segmental strings between the two context conditions. 
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FO H) 


a) l as Å 1 kÁ l pa t 


Figure 7: Two elicited declarative realizations of cuatro policias arrestaron al sospechoso “four police 
arrested the suspect" by a Spanish speaker from Lima. a) is an answer to the question “how many 
police arrested the suspect?” while b) answers “what did four police do to the suspect?" 


ip-boundary tone (L-/Lo) directly following the focused constituents in both the 
German and Spanish examples, causing pitch to steeply drop after reaching the 
accent peak, and resulting in an extended final low stretch in the German and the 
second Spanish example. This is where we can make out the most striking differ- 
ence to the Japanese examples (Figure 5). In both Japanese examples, the tonal 
movement on the AP bearing highest prominence is scaled high, which is compa- 
rable to what happens in the examples from the other languages. However, only 
in the first one, where the highest prominence is on the lexically pitch accented 
rooma “Rome”, pitch then also drops to an overall low level, like in the German and 
Spanish examples - this is here caused by the increased scaling (relative lowering) 
on the L tone of the H*L lexical pitch accent. The lexical pitch accent on the verb 
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/ 
M Rd N 
—— 


e RR =. ~ 
T T T y —————Á 
L* Hy H* L, L 
! L 
weil der Hammel | dem Rammler nachgelaufen ist 
because the sheep | the buck followed is 


Figure 8: Declarative realization of weil der Hammel dem Rammler nachgelaufen ist „because the sheep 
ran after the buck* by a German speaker, adapted from Féry (2017: 155). Focus is cued on Rammler 
"buck", note the difference between prefocal compression and postfocal deaccentuation. 


ikimashita is either very reduced or deleted. In the second one, such a steep drop 
cannot be observed. Instead, pitch actually stays quite high after reaching the high 
tonal targer of the AP-initial LH- boundary tone, and only gradually sinks down to 
the IP-final L%, with very slight additional movement caused by the pitch accent 
on the verb, which is strongly compressed. Here we can see that dephrasing in this 
case really results in the absence of the tones belonging to the following phrase, 
and that the high pitch following it is not cueing prominence (cf. Venditti et al. 2008: 
484—486). 

Another important difference is the directedness of phrasing: in Japanese, the 
focused or most prominent constituent begins a new AP that then extends until 
the end of the ip or IP, whereas the analyses for German and Spanish place this 
constituent at the (right) end of an (intermediate) phrase which might have begun 
considerably earlier (further to the left). Note that these differences in directedness 
or prosodic headedness are probably again tendencies, rather than absolutes, as 
Beckman & Pierrehumbert (1986: 285) already point out. A further similarity could 
be seen in the tendency shown in Japanese, Spanish, and German to make the tonal 
movement on the focused constituent the last one in the IP before the final bound- 
ary tone, or at least the last one of comparable pitch scaling. Thus there are real 
differences in the realization of how prominence asymmetries are encoded intona- 
tionally in these four languages (Egyptian Arabic, Japanese, Spanish, and German), 
but it also seems that not all of them are actually categorical, and that in order to 
adequately analyze them, it is also necessary to pay considerable attention to gradi- 
ent factors, such as pitch scaling. The description in O'Rourke (2005, 2009) suggests 
that Cuzco Quechua shares attributes with *accenting" languages in that it is said 
to realize pitch accents at prominent positions. However, it also shares some with 
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*phrasing" languages in that it has both initial and final ip-level boundary tones, 
although in *phrasing" languages, the relevant domain is usually taken to be the AP. 
The relative scarcity of different pitch accents is also more reminiscent of *phras- 
ing" languages. No evidence is found for dephrasing or deaccentuation. For Huari 
Quechua, comparable findings have not been made. My own analysis will show 
that prominence asymmetries are in fact encoded via deaccentuation/dephrasing, 
and also argue that both the Spanish and the Quechua data from Huari defies easy 
categorization as either “phrasing” or “accenting”. 


3.4.7 Restrictions on combinatorial freedom 


Let us now return to a consideration of tonal inventories. Even in languages with a 
large inventory, combinatorial freedom seems to be actually more restricted than 
might be assumed. In this section, I will present evidence that argues that there are 
strong constraints on reducing the number of contours at a phrasal level, and that 
this aids both in the identification of contours and in consolidating this level as the 
domain at which intonational meaning is conveyed. The intonational finite-state 
grammar proposed by Pierrehumbert (1980) (see Figure 9b) in principle allows 
for any combination of pitch accent plus ip-boundary tone / phrase accent plus 
IP-boundary tone (in this order), as chosen from the inventory at each point, to be 
realized in American English. At each node in the sequence, there is a free choice 
between each of the options available, leading to 8 x 2 x 2 = 32 possible combina- 
tions of pitch accent plus ip and IP boundary tones for American English (leaving 
out the initial boundary tone), compared to 2 x 5 available combinations for Tokyo 
Japanese, according to the inventory in Figure 9a. 

Empirical reality paints a somewhat different picture. Dainora (2001, 2002, 
2006) shows, using a corpus of radio speech in American English comprising about 
1200 IPs, that out of 100 possible sequences of prenuclear pitch accent plus nuclear 
pitch accent plus ip boundary tone plus IP boundary tone, only 44 are attested; 
that out of 20 nuclear sequences?” - nuclear pitch accent plus ip and IP bound- 
ary tone — only 18 are attested and four of them (H* L- L9o, H*L- H%, LH* L- H96, 
LH* L- L%, in order of decreasing frequency) account for nearly 80% of all attested 
nuclear sequences (Dainora 2006: 112-113). The identity of a pitch accent plus an 


37 The pitch accents H*L and HL*, contained in Pierrehumbert's original inventory, are dropped 
in Dainora's investigation because they were originally introduced only to trigger downstep and 
are thus not taken to constitute independent categories, see Dainora (2001: 99). Downstepped H 
tones are also collapsed together with non-downstepped ones, because Dainora (2001: 46—69) finds 
that they come from the same distribution. 
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A Tokyo Japanese 


Boundary tone Phrasalhigh Lexical pitch accent Boundary tone BPM 


oO— L%—-0—_ H- o— L% 


T 


B American English 


Boundary tone Pitch accent Phrase accent Boundary tone 


MEX 
H% R H- ae 
RRA? 


Figure 9: Comparative tonal finite-state grammars of Tokyo Japanese (A, above) and American English 
(B, below, originally from Pierrehumbert 1980: 29), adapted from Igarashi (2015: 560). The arrow 
pointing to the left in b) indicates that the pitch accent node can be repeated. 


ip boundary tone is also a considerable predictor for the identity of an IP boundary 
tone, so much so that “in some cases the nature of the boundary tone is almost 
predetermined by other parts of the phrase” (Dainora 2002: 4). A further finding 
is that adjacent sequences of H tones are quite rare, irrespective of whether they 
are pitch accent or boundary tones: less than 16% of all attested contours contain 
such sequences (cf. table 1 in Dainora 2006: 113).3* A similar observation, albeit 


38 To bring this into perspective: even without assuming anything about what part of the nuclear 
contour a tone T belongs to, two tones in a sequence will have a probability of 0.5 * 0.5 = 0.25 of 
being HH, which is already more than the attested 16%. If we take the five pitch accents Dainora 
assumes as fundamental, atomic units, then 12/20 = 60% of all possible nuclear contours (pitch 
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without data on the frequencies of individual contours, is made concerning Bengali 
by Hayes & Lahiri (1991). According to their analysis, Bengali has in nuclear posi- 
tion a pitch accent (L*, H*, LH”) followed by an optional iP-boundary tone (a PhP 
in their terminology, either Hp or L;), plus an optional IP-boundary tone (H; or Lj) 
and one obligatory IP-boundary tone (H; or Lj).? Free combination would yield 54 
possible tone sequences, but only eight of those are attested, each with a distinct 
intonational meaning (Hayes & Lahiri 1991: 72). Quite comparably, the summary 
of attested realizations in Peninsular Spanish of bebe la limonada “s/he is drinking 
the lemonade" in Hualde & Prieto (2015: 389), modestly said to list *some possible 
intonations" of that sentence, but based on a very comprehensive overview over 
the intonation research on Spanish in at least the last two decades by two of the 
leading experts in the field, only gives ten such combinations of prenuclear plus 
nuclear pitch accent and IP-boundary tone as well as an optional iP-boundary tone 
(see Table 3). 


Table 3: Attested intonation contours of bebe la limonada “S/he is drinking the lemonade” in 
Peninsular Spanish, adapted from Hualde & Prieto (2015: 389). 


bebe lalimonada ^ Function 
a. L«H* LH* L96 Statement or command 
b. L«H* L* L% Statement or command 
c LH* L- L* L% Statement or command with emphasis on first word 
d. L«H*H-  LH*L96 Statement or command with emphasis on second word. First word is topic. 
e. L«H* LH* LIH% Statement of the obvious (see also echo-question expressing surprise) 
f. L*H L* H96 Information-seeking question 
g. L«H* LH* HL% Confirmation-seeking question 
h. L«H* LjH* L% Echo question (surprise etc.) 
i. LH* H* H% Quiz question 
j.  L«H* HL* L96 Insistent explanation / Insistent request 


This should be compared against the inventory, as stated above (cf. section 3.4.5), 
of 2 prenuclear and 5 nuclear pitch accents, six IP-boundary tones and three 
ip-boundary tones. The SpToBI-tradition has not followed the convention adhered 
to in the descriptions of American English or German that an IP-boundary tone 
must always be preceded by an ip-boundary tone (cf. Figure 9 and Pierrehumbert 


accent * phrase accent * boundary tone) contain at least one sequence of an H followed by another 
H (some, like LH* H- H% or H*H H- H%, contain two or even three). From this expectation, the 
attested 1696 clearly constitute a marked deviation. 

39 This is the same as saying that IP boundary tones are either monotonal or bitonal, but generally 
obligatory, as done e.g. in analyses of Spanish. 
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1980; Beckman & Pierrehumbert 1986 for English, Grice et al. 2005: 68 for German). 
Instead, the ip boundary tone is often taken to occur somewhere before the nuclear 
pitch accent, but it is effectively optional (see Table 3). This means that the possi- 
ble combinations must include four options for ip-boundary tones, the three tonal 
options (L-, H-, !H-, cf. section 3.4.5) plus the one that no tone occurs. Calculated 
like this, the attested combinations in Table 3 are only ten out of a possible2x 4x 
5 x 6 = 240 (taking the nuclear pitch accent inventory as not including L*H and 
L«H*, which are only attested as prenuclear from Table 3). Counting only possible 
nuclear contours would still yield 30 possible combinations, of which only eight are 
attested. It is doubtlessly the case that even with the intensive research done on 
the intonation of *Peninsular" Spanish, the information presented in Table 3 is far 
from definite. It is quite likely that with even more empirical investigations, some 
additional contours not listed there might turn up, and it is certainly the case that 
the specification of the form-meaning correspondences given could bear consider- 
able refinement (cf. Fliessbach 2023). These considerations notwithstanding, the 
discrepancy between what is theoretically possible and what is attested in terms of 
tonal combinations seems to be equally striking for this variety of Spanish as it is 
for American English or Bengali. 

One explanation for this discrepancy is provided in the case of Bengali by 
Hayes & Lahiri (1991: 72-75) who point out that nearly all unattested contours 
are those that include a sequence of like tones (L-L or H-H) and therefore propose 
that the obligatory contour principle (OCP) is effective in Bengali “at the level of 
the entire tune". The OCP is a prohibition against sequences of like tones based 
on observations first made in Leben (1973). It is discussed and named as such in 
Goldsmith (1976). There and in Odden (1986), it is established that even though 
evidence for a principle like the OCP seems abundant in most of the African tone 
languages studied there, it cannot be said to hold universally and that asymmetries 
exist in how the OCP is applied at different levels of linguistic structure, also in 
terms of its directedness. The OCP has since been extended to non-tonal aspects of 
linguistic sound structures (McCarthy 1986; Frisch et al. 2004), and is functionally 
motivated in a broader context as a tendency to avoid sequences of perceptually 
overly similar elements (cf. Boersma 1998, especially ch. 18; Flemming 2002, 2004). 
Boersma (1998: 416) also describes a tendency against the repetition of similar artic- 
ulatory gestures, which he takes to be part ofthe articulatory functional motivation 
for the OCP. The language-specific aspect of the OCP is also acknowledged in Hayes 
& Lahiri (1991: 74), and it also shows in the examples we have discussed: while in 
Bengali, any sequence of like tones seems to be prohibited, the data from American 
English seem to suggest a robust but violable tendency to avoid sequences of high 
tones, yet sequences of low tones are quite frequent. 
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Forthe Spanish data from Table 3, the same argument cannot be as easily made: 
six of the prenuclear + nuclear contours contain sequences of like tones (b, c, d, g, 
i, and j), of those, four contain sequences of high tones (d, g, i, and j); of the nuclear 
contours themselves, three (b/c, g, and i) have sequences of like tones, two of those 
(g and i) consisting of high tones. Can we thus say that Peninsular Spanish entirely 
ignores the OCP, and if yes, what else might account for the discrepancy between 
possible and attested tone sequences? First of all, it is important to recognize that 
for all three languages we are discussing here, the facts we have seen suggest that 
at least at some level, it is the tonal sequence as a whole, without an identification 
of individual tones as belonging to pitch accents or boundary tones, that is impor- 
tant as a domain to which constraints or processes might apply. That assumption 
is inherent in calling upon the OCP to explain the (relative) absence of sequences 
of like tones, no matter whether they are pitch accents or boundary tones, both in 
American English and in Bengali, for which this assumption is explicitly acknowl- 
edged in Hayes & Lahiri (1991: 74). It is also inherent in the practice, followed in 
the descriptions of all three languages here as well as several others*? of assigning 
intonational meanings to tonal tunes consisting of a combination of at least nuclear 
pitch accent plus boundary tone(s), the nuclear contour. Especially for Spanish, it 
has even been proposed that for the identity of the tune as a whole, it is somewhat 
irrelevant which of the tones it is made up of form the nuclear pitch accent, as 
long as they stay in the same order (Torreira & Grice 2018). The domain for such 
a holistic tune, linked to some pragmatic meaning, is sometimes taken to be the IP 
(Hayes & Lahiri 1991: 52). This finds its parallels on the meaning side by statements 
to the effect that an IP encodes “an informational unit or sense unit" (Heusinger 
2007: 280, referring to concepts developed in Halliday 1967 and Selkirk 1984, 
respectively), that it represents one “informational chunk" which is processed as 
such in production and perception, similarly across languages (Himmelmann et al. 
2018: 236 and the works cited therein), or that it conveys exactly *one new idea" 
(Chafe 1994: 108—119). Actually, Chafe (1994: 57-58) proposes the „one new idea 
constraint“ to hold for what he calls „intonation units“, which he states are more 
similar to the intermediate phrase than the intonational phrase in the description 
of Beckman & Pierrehumbert (1986). Since we here assume with Ladd (2008: 134) 
that it is the intermediate phrase, and not the intonational phrase, which defines 


40 Cf. Grice et al. 2005: 72 for German, Prieto et al. 2015: 45-46 for Catalan, Gili-Fivela et al. 2015: 
191-193 for several varieties of Italian, Frota et al. 2015: 278-279 for European and Brazilian Portu- 
guese, Arvaniti & Baltazani 2005: 95 for Greek. The practice is not applied to Standard Dutch in Gus- 
senhoven 2005: 137, who also claims that there are no restrictions prohibiting any of the possible 
combinations of nuclear pitch accents plus boundary tones (however, of the 24 possible resulting 
combinations, he only attests to 18, stating that the others are probably rare). 
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the nuclear accent and the nuclear contour, Chafe's idea is still compatible. Thus, 
the relevance of a holistic tune whose domain is the ip/IP seems relatively well 
grounded. If considered from the perspective of the (nuclear) tune, something like 
the OCP or rather its somewhat more abstract correlate of similarity avoidance in 
order to create perceptual contrasts (Flemming 2008) can probably still be made 
use ofto explain the discrepancy between possible and attested tone combinations 
also in Spanish. 

While we cannot provide a definitive answer here, the evidence seen so far 
suggests that only a relatively limited number of tunes at the level of the nuclear 
contour can be usefully associated with meanings in a language. Evidence that 
unfamiliar tunes can slow down processing (Braun et al. 2011) also suggests a way 
that tune frequency at this level can be brought into relation with notions like 
markedness. Potentially, it would make sense that a small number of acoustically 
least eventful contours are used most frequently, in a broad range of situations 
covering both unmarked meanings and more marked meanings when they are 
retrievable from context. This opens up a space for more eventful contours, acous- 
tically and perceptually more prominent, to signal cases in which it is particularly 
necessary to convey a marked meaning even against expectations built up from a 
conversational context. They presumably need to be both acoustically salient and 
of relatively low frequency, so that the mere fact of their occurrence is sufficient to 
captivate attention to such an extent that a change of contextual expectations might 
be effected. At the same time, if they are to convey quite specific meanings, they 
need to be sufficiently recognizable, which means they must be spaced sufficiently 
far apart from each other in terms of their dynamic acoustic properties. This as 
well as their low frequency (by itself an obstacle to identification) presumably also 
contributes to setting an upper limit on the number of types that are practically 
differentiable, each with its separate meaning. We will revisit some of these issues 
in section 3.7. 


3.5 Relation between tones and the segmental string: 
association and alignment 


In the previous section, we considered aspects of intonation at the level of the 
(nuclear) contour. In this section, we will instead look at the level of individual 
tones, specifically the phonology and phonetics of how they relate to the segmen- 
tal material, and how this differs both crosslinguistically and between types of 
tones. 

Since Pierrehumbert (1980), intonational tones have been classed into dif- 
ferent categories: the tones associated with metrically strong positions are pitch 
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accents (T*) and those located at the edges of prosodic constituents are boundary 
tones (T%). In the rise-fall-rise contour L*H L- H%, that accounts for the starred L 
and the final H%, but leaves out the trailing H of the pitch accent and the L-. The 
latter was originally called a “phrase accent” in Pierrehumbert (1980) and was 
then reanalyzed as an intermediate boundary tone in Beckman & Pierrehumbert 
(1986: 288). The former, unstarred tone in a bitonal pitch accent is either called a 
leading or trailing tone, as already stated above. Even though we have just seen 
that there are aspects of tonal behaviour, especially in relation to their pragmatic 
function, that suggest it is useful to consider tonal contours in a somewhat holis- 
tic fashion, this does not mean that all of the tones in such a contour-forming tone 
sequence show the same properties with respect to how they relate to the text 
and the prosodic structure. In this regard, the *phrase accents" and unstarred 
tones in multitonal pitch accent have often been seen to behave differently from 
the starred tones and boundary tones. For the starred tone, metrical structure 
provides an association site (Pierrehumbert 1980: 32-33). For the IP-boundary 
tone, metrical structure is irrelevant but the edge of the IP provides a “straight- 
forward" orientation (Pierrehumbert 1980: 32). Thus these two types of tones are 
seen as oriented by the prosodic and metrical structure, independent of each 
other and of their tonal environment. The phrase accent, on the other hand, *is 
found near the end of the word with the nuclear stress even when this is not a 
metrically strong position" (Pierrehumbert 1980: 32). Intonational descriptions of 
Spanish do not make use of the phrase accent at all, and use the ip boundary tone 
in a less restricted way, as we have already seen in 3.4.7. Regarding unstarred 
tones, Pierrehumbert concludes after investigation of a small corpus produced 
specifically for this purpose and in reference to the trailing H in a L*H pitch 
accent that it *is located at a given time interval after the L*, regardless of the 
stress pattern on the material following the accented syllable" (Pierrehumbert 
1980: 77). 

Even without saying anything about phonetic alignment (the timing of a tonal 
event relative to the segmental material), in these descriptions these two are there- 
fore different from the starred tones and boundary tones because they aren't 
orientated directly with reference to the metrical or prosodic structure but only 
indirectly by reference to other tonal events. From this description alone it would 
be hard to argue that either of the phrase accent or unstarred tone can be associ- 
ated to a specified TBU, as the starred tone is. At the same time, they also show *a 
certain amount of variation" in their placement (Pierrehumbert 1980: 32), mirror- 
ing their less directly determined phonological status in phonetic terms. We have 
also already seen that tonal alignment can affect the (perception) of intonational 
meaning, in the discussion of the L*H L- H% vs. LH* L- H% contour of American 
English. Meaning differences have also been related to alignment for example in 
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German (Kohler 1987, 2005). Thus, a relation clearly seems to exist between align- 
ment behaviour and phonological categories of tones (which may then also relate to 
meaning differences). This relation has since been further refined. In the following 
sections, we will first take a closer look at the phonetic alignment of pitch accents 
taken to be associated with prominent positions, and then at how the absence of 
such robust alignment can be seen as contributing evidence for the absence of such 
positions at the word level in some languages. The variation in degree of anchor- 
ing both across languages and types of tones is highly relevant for the present 
Work because it also represents one of the dimensions of variation between Huari 
Spanish and Quechua. 


3.5.1 Segmental anchoring in pitch accents 


On the one hand, a number of findings on the relatively solid temporal alignment 
of the pitch turning points of pitch accents have led to the formulation of the “seg- 
mental anchoring hypothesis", according to which they are aligned *with specifi- 
able points in the segmental string" (Ladd 2006: 20). For Greek prenuclear rising 
accents, Arvaniti et al. (1998, 2000) showed that the elbow of the low tonal target 
consistently aligned just before (5 ms on average) the consonantal onset of the 
stressed syllable, while the peak of the high tonal target aligned just after (mean 
of 17 ms) the onset of the vowel in the posttonic syllable. That is to say, the dis- 
tance between the low and the high tonal target is dependent on the length of the 
stressed syllable plus the length of the onset consonant in the posttonic, but not 
upon that of the following vowel, and both tones can be said to be stably aligned 
in relation to the stressed syllable and independently of each other, unlike in the 
case of the English L*H. This leads Arvaniti et al. (2000) to propose that both tones 
in this bitonal configuration are associated with the stressed syllable, and starred, 
ie. L*H*. 

Onthe basis of similar findings, in particular because both the distance between 
valley and peak and the distance from stressed syllable onset to peak correlate with 
stressed syllable duration, O'Rourke (2005: 105—108) also proposes the association 
of both tones (symbolized also as L*H*) with the stressed syllable for nuclear and 
prenuclear pitch accents in both Cuzco and Lima Spanish. However, she also finds 
on the one hand that for her Lima data, no syllable boundary serves as an anchor 
for the nuclear peaks, but since the peak always occurs within the bounds of the 
stressed syllable, association with it is nonetheless assumed (O'Rourke 2005: 108). 
On the other hand, prenuclear peaks seem also quite strongly attracted by the right 
boundary of the word containing the stressed syllable, more so for Lima than for 
Cuzco, which concurs with more prenuclear peaks in Lima being realized after 
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the stressed syllable than in Cuzco*! (O’Rourke 2005: 79-84, 105—106). For the first 
prenuclear pitch accent in a phrase in Madrid Spanish, Prieto & Torreira (2007) find 
that the H peak position varies dependent on the syllable structure of the stressed 
syllable, but again because it is in general realized within the stressed syllable, they 
assume association of the H tone with it. For Cuzco Quechua, O'Rourke (2009) finds 
that peaks in final words are realized significantly earlier than in prefinal words, 
with both however still occurring mostly within the tonic syllable. She therefore 
proposes an association of the low tone with the stressed syllable (taken to be the 
word penult), symbolized as L*H, for prefinal words, and of the high tone, symbol- 
ized as LH*, for final words. Another possible analysis she considers is to assign 
LH* throughout and to attribute the differing alignment to the presence or absence 
of an incoming phrasal boundary tone (O'Rourke 2009: 309, note 16). 

Mücke et al. (2009) investigate peak alignment in German rising contrastive 
pitch accents, comparing Northern (Düsseldorf) and Southern (Vienna) varieties, 
prenuclear and nuclear position, syllable structure (open vs. closed syllables) and 
acoustic vs. articulatory measurements. The picture that emerges from the acoustic 
measurements is that while both dialectal background and syllable structure have 
measurable effects (Southern varieties and closed syllables have later peaks), these 
are also (as in O'Rourke's findings on Lima and Cuzco Spanish) subject to consider- 
able speaker variation (n-2 each for both varieties), resulting in syllable structure 
being a significant factor only for nuclear accents overall, and dialectal background 
for neither. Yet the alignment difference between nuclear and prenuclear position 
was found to be significant and substantial for all speakers, leading Mücke et al. 
(2009: 336—337) to call the dialectal differences gradient and those between accent 
status discrete and symbolical. Articulatorily, their findings show that latencies 
between peaks and articulatory anchors (opening and closing gestures) are smaller 
than those measured acoustically between peaks and segmental anchors, but do 
not differ much in variability (Mücke et al. 2009: 336). 


41 Note that considerable individual differences between Cuzco Spanish speakers with regard to 
prenuclear peak placement lead O'Rourke to group them into 4 different patterns. The first pattern 
(A) is essentially the same as that exhibited by Lima speakers, with all prenuclear peaks realized post- 
tonically, while in the last (D) almost all prenuclear peaks are realized within the stressed syllable. 
Patterns B &C are in between, with prenuclear late peaks more frequent on subjects than on verbs 
in the SVO sentences O’Rourke elicited. Pattern D was produced by the largest group (A: n-3, B: n=4, 
C: n=3, D: n-5). Interestingly, the grouping of speakers according to these patterns could not be corre- 
lated to their status as either Spanish monolinguals, early Quechua-Spanish bilinguals («5 years) or 
late Quechua-Spanish bilinguals (having learned Spanish only after entering school), but as a whole, 
Cuzco speakers thus behaved clearly differently from Lima speakers (for details, see O'Rourke 2005: 
79-86). 
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In summary, it seems that languages, speakers and individual tones in pitch 
accents differ with respect to the degree that they have invariable segmental 
anchoring behaviour. It is perhaps worth emphasizing the relatively large role of 
individual variation found by the studies discussed in this paragraph especially 
when considering that with the exception of O’Rourke (2005), none had more than 5 
experimental subjects. Analysis of data from 72 speakers in total of German (n= 35) 
and Neapolitan (n= 17) and Pisa Italian (n= 20) has shown that this alignment vari- 
ation is so far-ranging that Niebuhr et al. (2011) suggest speakers might even follow 
different global strategies, with one group broadly seeking to align f0 turning points 
with segmental landmarks, and the other seeking to realize contour shapes specific 
to pitch accents. Another summary finding seems to be that there is both evidence 
for independence between the individual tones making up bitonal pitch accents 
(their individual segmental anchoring) and for considering their movement as a 
single unit (the fact that their alignments are never totally independent of each 
other, cf. Ladd 2006: 27-28). Different bitonal pitch accents in different languages 
tend more to one or the other side of this continuum: in the English L*H, with the 
“fixed temporal distance” at which the H follows the L*, the two tones are less inde- 
pendent of each other than in the Greek L*H*. This difference recalls one that Grice 
(1995) has proposed to differentiate within English bitonal pitch accents: those with 
trailing tones (T*T), with the trailing tone occurring at a given distance after the 
starred one, are “melodic units”, in which the unstarred tone is not directly associ- 
ated, while those with leading tones (TT*) are “tone sequences”, in which both are 
somewhat independently associated (Grice 1995: 216—219). Investigations into the 
coordination of articulatory gestures (e.g. Katsika et al. 2014; Tilsen 2016, 2019) are 
likely to provide more fine-grained and adequate explanations also of the effects 
of larger prosodic domains on peak alignment (cf. also Ladd 2006: 33-35), but this 
cannot be dealt with here. In sum, tones belonging to pitch accents have shown 
themselves to be aligned relative to a TBU assigned to a prominent position in the 
metrical structure, indicating association with that TBU. Where such independent 
alignment cannot be found, independent association is also in doubt. 


3.5.2 Secondary association and tonal spreading 


After having introduced segmental anchoring in 3.5.1, we can now turn to two 
mechanisms by which phonological tones can relate to the segmental string without 
independently associating, secondary association and tonal spreading. They allow 
to treat contours as similar that share the same number and type of turning points 
but differ in the way high or low stretches are extended across material, and to 
differentiate between points of similar tonal height according to whether pitch is 
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actively manipulated there due to a relevant prosodic position, or simply main- 
tained due to a specification from elsewhere. An optimality-theoretic approach to 
these issues is also introduced. This is crucial for the later analysis in sections 5.3 
and 6.3, because it will allow for a principled generalization across superficially 
different contours containing such stretches in both Huari Quechua and Spanish, 
and to demonstrate how the different intonational variants we will encounter are 
relatable to each other via stepwise changes in constraint rankings. 

Grice et al. (2000) study the “Eastern European Question Contour" (EEQT), a 
polar question contour consisting of a final rise-fall (LHL) in several related (vari- 
eties of) Eastern European languages, Standard and Transylvanian Hungarian, 
Standard and Cypriot Greek, and Standard and Transylvanian Romanian. They 
arguethatinall varieties, the phonological representation is L* H- L%, i.e. the H 
tone is a phrase accent not associated with the nuclear accent and thus not *prom- 
inence-lending" in either of the varieties, but the position of its peak seems to vary 
discretely, rather than gradually, between the varieties. In a configuration in which 
nuclear stress is on the prefinal word in an IP, producing a low tonal target on the 
nuclear stressed syllable followed by a low stretch throughout the nuclear word, 
the H peak is aligned on the penult of the final word in Standard Hungarian (unless 
that is also the initial, i.e. stressed syllable, in which case the peak moves to the 
final syllable) and on the penult or final syllable in Cypriot Greek, independent of 
which syllable bears stress in the final word, and on the postnuclear stressed sylla- 
ble (in the final word) in Standard Greek and Romanian. The Transylvanian varie- 
ties show the same behaviour as their standard counterparts, except that instead of 
the low stretch following the L*, pitch directly rises to form a plateau extending to 
the position at which the peak occurs in the standard varieties. In order to analyze 
this behaviour, Grice et al. (2000) take recourse to a mechanism originally proposed 
in Pierrehumbert & Beckman (1988) for Japanese, secondary association. Pierre- 
humbert & Beckman take the AP-initial H- tone in Japanese to associate primarily 
to the left edge of the AP, but secondarily to the second mora of the AP. For Japanese, 
this (secondary) association is reflected in the findings by Ishihara (2006: 72) that 
the AP-initial peak in unaccented words is quite stably aligned just at the beginning 
of the third mora, although Venditti (2005: 181) states that its alignment can also 
vary considerably. Returning to the EEQT, Grice et al. (2000) propose to analyze 
the H- phrase accent as primarily being associated with the right phrase edge, but 
secondarily with the position at which the peak is found. In the Transylvanian vari- 
eties, it is secondarily associated both with the nuclear stressed syllable and that 
additional position, via tone copying, not spreading (i.e., the tone does not associate 
with each intervening syllable; see Table 4 for a summary of the secondary associ- 
ation sites). 
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Table 4: Secondary association of H- phrase accent in the Eastern European Question 
Tune (EEQT), adapted from Grice et al. (2000: 158, 161). 


nuclear accent in non-final word nuclear accent in final word 
Standard Hungarian penult penult 
Standard Greek postnuclear stress final 
Cypriot Greek penult/final final 
Standard Romanian postnuclear stress final 
Transylvanian Romanian nuclear syllable and postnuclear stress nuclear syllable 
Transylvanian Hungarian ` nuclear syllable and penult penult 


Gussenhoven (2000a, 2000b, 2002b, 2004) develops a slightly different intonational 
phonology based on similar considerations. He takes phrase accents to be boundary 
tones, and adopts the notion of phonological alignment from prosodic morphology 
(cf. McCarthy & Prince 1993, 2001[1993]). Prosodic constituents (among which tones 
are counted) can align with the edges of other prosodic constituents. The relative 
ranking of optimality-theoretic phonological alignment constraints for the tones 
in a given input determines both their sequence and proximity to each other in 
the output. Because his model is couched in OT-terms, constraints are formulated 
together with their conflicting counterparts. For alignment, this leads easily to sit- 
uations in which tones are aligned in opposing directions (multiple alignment), 
and without any higher-ranking constraints intervening, the results are long pitch 
stretches, such as the plateaux in the Transylvanian varieties of the EEQT, in which 
the intervening TBUs are not associated to the tone (which would be spreading). 
The endpoints of such stretches do not have to have the same phonological status. 
Gussenhoven only allows association with TBUs, but not with edges of prosodic con- 
stituents. Association acts in two directions via two families of constraints, those 
that associate classes of tones with TBUs (e.g. H > TBU) and those that associate 
classes of TBUs with tones (e.g. o € T). In this way, a tone available at a position can 
associate with that position because it is a tone that has to associate, or because it is 
a position that has to associate, or it does not associate at all (if the constraints that 
would cause association in this case are outranked by a constraint militating against 
association). That means that in principle, tones in his model can be only aligned 
but not associated, and still be realized, if no constraint deleting unassociated tones 
is high-ranked. Secondary association, in Grice et al's terms, can then result out of 
a combination of multiple alignment, with the tone associating with a relevant TBU 
at both edges, or only at one of them. Additionally, an extended pitch stretch could 
also not be associated at either edge in Gussenhoven's model. Importantly here, 


42 For a more detailed introduction into Gussenhoven's OT model of intonation, see section 5.3. 
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the status of a tone as associated with a TBU or “only” aligned with some prosodic 
constituent is taken by Gussenhoven (2018: 406-407) to be reflected in its phonetic 
alignment behaviour: if a tone’s peak or valley alignment can be shown to be indif- 
ferent to phonological structure, this is taken as indication for a lack of association. 

With this additional specification, the Gussenhoven model is in principle 
capable of distinguishing between more alignment-association-scenarios than the 
Grice et al. (2000) model: in the study on German mentioned above, Mücke et al. 
(2009: 335-336) identify a second low tonal target in their data as evidenced by 
a low elbow following the nuclear (but not the prenuclear) pitch peak at a fairly 
invariable distance (some 150 ms later), independent of the segmental material. 
They take the presence of this additional low tonal target in the nuclear condition 
to be the reason for the significantly earlier peak alignment of the nuclear pitch 
accents compared to the prenuclear ones, comparable with the analysis for post- 
nuclear deaccentuation in German by Féry (2017) and also with the analysis of 
nuclear phrasing for Spanish espoused here. However, they analyse it as a phrase 
accent secondarily associated with the nuclear stressed syllable. This gives this low 
tone phonologically the same status as e.g. the Japanese AP-initial H, or the EEQT 
high phrase accent tone. However, the latter Grice et al. (2000) have shown to be 
sensitive to discretely variable aspects of structure in its dialectal variation (see 
Table 4), and the former is also much more sensitive to structure as evident from 
its temporal alignment (cf. Ishihara 2006). This difference in sensitivity to structure 
between the German postnuclear low tone, on the one hand, and the EEQT high 
phrase accent and the Japanese AP-initial H, on the other, could be seen as an argu- 
ment against assigning them the same status. Indeed, it should be at least possible 
to keep them apart in their representation: in the Gussenhoven-style model, the 
German low tone found by Mücke et al. (2009) would certainly be analyzed as as an 
additional leftward alignment of the upcoming low IP-boundary tone without asso- 
ciation. This can be seen from analyses of comparable low phrase accents in Dutch 
by van de Ven & Gussenhoven (2011), in West Germanic varieties by Peters et al. 
(2015), and in English according to the analysis by Gussenhoven (2018) of the find- 
ings reported on in Barnes et al. (2010).“ Because in principle it allows for a more 
fine-grained representation (which does not mean that it is necessarily the more 


43 Barnes et al. (2010: 989) themselves take the turning point to be “epiphenomenal, the result of a 
constraint on the phonetic implementation of deaccenting in the postnuclear region: The fall from 
the level of the H* maximum to the level of the eventual L- at the phrase boundary (i.e., E2) must 
be accomplished sufficiently swiftly to avoid creating the percept of a pitch accent (L* or down- 
stepped H*, for example) on a lexically stressed syllable within the postnuclear region.” Arguably, 
the unblocked leftward alignment of the L%, resulting in a flat postnuclear stretch is precisely what 
would constitute deaccentuation here. 
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adequate representation, further studies will have to decide that), I am adopting 
the Gussenhoven-style model here together with its heuristic about the association 
status of a tone based on its alignment behaviour. 


3.5.3 Representational possibilities for the association and alignment 
of tones in languages without word stress 


We can now extend the discussion on variability in tonal anchoring to languages 
without word stress. Since we are pursuing the hypothesis that Huari Quechua has 
no word stress at all or that it is almost irrelevant in its phonological system, this 
section will provide criteria from alignment and association that can be brought to 
bear on the matter. Following the Gussenhoven-style approach, a phrase-final HL 
sequence can be represented in at least four different ways: 


(13) Representational possibilities for a phrase-final HL sequence, adapted from 
Maskikit-Essed & Gussenhoven (2016: 355) 


a. ... ma ma ma),,),,), b. ... ma ma ma),)),), 
H*+L H* L% 

c. ... ma ma ma),),), d. ... mama ma),),), 
H% L% HL% 


In (13)a, the starred H tone is associated with the syllable bearing word stress, and the 
L tone is trailing (which means that it is left-aligned with the right edge of the H). In 
this phrase-final configuration, it is unclear whether the model would have different 
phonetic alignment predictions for H*+L and H* L% here (L% being minimally right- 
aligned with the right edge of ọþ/ip or VIP). This would also depend on the constraints 
on tonal crowding. Rathcke (2016) demonstrates that falls in pitch accents and those 
due to boundary tones in German and Russian are treated differently in terms of com- 
pensation strategies under time pressure (truncation, compression, tonal re-align- 
ment), but also that these strategies are language-specific to a certain degree. 


3.5.3.1 French 

(13)b, in which no word stress exists but the H tone forms a pitch accent associated 
with a postlexically determined position (the final syllable in the phrase), is exem- 
plified by French, according to Maskikit-Essed & Gussenhoven (2016: 356). In (13)c, 
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two boundary tones simply align rightmost and associate with the rightmost avail- 
able TBUs. In (13)d, finally, tones are only aligned with the right edge of the phrase, 
but remain unassociated. Maskikit-Essed & Gussenhoven argue that (13)b is differ- 
ent from (13)c, the latter of which they say is exemplified by Korean AP boundary 
tones, because in French, whether a postlexically assigned phrase-final position is 
eligible for accentuation can be exceptionally determined by the morphophonolog- 
ical makeup of the phrase (see Figure 10 for the intonational structure of (Seoul) 
Korean and French): without citing sources, Maskikit-Essed & Gussenhoven (2016: 
356) state that in French que sais-je? “what do I know?", the AP-final peak cannot be 
aligned on and the corresponding H tone not associate with the pronominal clitic je, 
but that this is possible for the pronominal clitic le in prends-le “take it". 


(Seoul) Korean French 
IP IP 
AP (AP) AP (AP) 
w (W) (Wf)(Wf) (Wc) Wc 
g a g o g..U7 AC. oga 
T H L Ha % LHi LH* % 
IP = Intonation Phrase, marked by a boundary tone %. IP Intonation Phrase We content Word 
AP = Accentual Phrase, marked by THLHa tone pattern AP Accentual Phrase c Syllable 
T-HifAP initial segment is aspirated or tense C, /h/, or /s/; L, ` wr function Word % IP boundary tone 
otherwise 


W = phonological word; c = syllable. 


Figure 10: Intonational structure of (Seoul) Korean, left, and French, right, adapted from Jun (2007: 
151), and Jun & Fougeron (2002: 152), respectively. 


This is a somewhat controversial analysis. According to Welby (2006: 347), the 
general view is that the AP-final H will associate with the last full vowel, which 
excludes schwa, the vowel in both le and je (if one is realized at all). Welby & Loev- 
enbruck (2006: 114) find that in contrast to this generalisation, the final peak can 
occasionally align on an AP-final schwa, but do not relate this to any lexical or 
morphosyntactic difference. Féry (2017: 182) distinguishes between two kinds of 
AP-final schwa in French: a postlexically inserted schwa, which cannot act as TBU 
for the final H tone, and an *underlying" schwa, which can. She explicitly gives 
examples including both le and je for the latter category. In spite of this, there is 
some agreement that the AP-final accent (LH*) in French is a pitch accent, indeed 
associated with what should perhaps be called the final licit TBU in the phrase, 
while the AP-initial accent (LHi or LH-) is a phrase accent and not associated with a 
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TBU (Jun & Fougeron 2000, 2002; Welby 2006; Welby & Loevenbruck 2006; D’Impe- 
rio et al. 2007). This is argued to be so because while the former is stably temporally 
aligned (allowing only for a kind of discrete variation in alignment with either the 
AP-final vowel or the penultimate one if the final one is a schwa) and the syllable on 
which it is realized is longer and louder than its neighbours, the latter is optional, 
varying in its alignment within the first three syllables of the AP and not accompa- 
nied by more loudness and longer duration (Jun & Fougeron 2002: 159). 

Welby (2006) is able to further differentiate this picture based on measure- 
ments of phonetic alignment: in the initial rise, the L is stably aligned at the left 
edge of the first content word and thus taken to be associated both with that word’s 
left edge and optionally secondarily with the left edge of the AP** (Welby 2006: 
364—366). In Gussenhoven’s model, association is only possible with TBUs, but not 
with constituents (Gussenhoven 2004: 155, 2018: 408—409). Yet the stable alignment 
of the initial elbow with the left edge of the first content word in Welby's data indi- 
cates association. It could be modeled with constraints that align it with both edges 
(with that aligning it with the content word edge ranked higher) and a constraint 
that stipulates that it must associate with whichever TBU is available in its location 
(L > TBU). The AP-final H is consistently aligned in the final syllable, accompanied 
by syllable lengthening and also interpreted as the starred tone of a pitch accent, 
albeit one that is not *prominence-lending" (Welby 2006: 364-365). Thus the two 
outside tones are stably aligned with segmental landmarks, even across different 
speech rates and taken to be associated. The two middle tones are different: the 
initial H has highly variable alignment not just within one syllable but extending 
over at least the initial two syllables of the first content word and the final L varies 
equally in its alignment. Both are taken not to be associated, yet *edge-seeking", 
i.e. showing alignment tendencies towards their respective AP-edges (in terms of 
the Gussenhoven model, but they are also not realized at a relatively constant dis- 
tance to their initial L and the final H, respectively, varying also with speech rate 
(Welby 2006: 366—367). Thus, the French AP seems to consist of tones belonging to 
at least two of the configurations above. The position of all tones is postlexically 
determined, so (13)a is not appropriate. However, according to Welby's (2006) anal- 
ysis, both the AP-final H* and the AP-initial L should be classed in category (13)b 
because they are clearly associated with a postlexically determined position whose 
eligibility is itself subject to (morpho-)phonological constraints. Both the middle 
tones, initial H and final L, should most appropriately be classed in category (13) 
d, since they are not associated (independent of whether they form part of bitonal 


44 Because it forms a low stretch if the content word is preceded by function words that in most 
cases extends to the beginning of the AP. 
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accents with the external tones). Category c) is perhaps best occupied by the Japa- 
nese AP-initial H- in cases of unaccented words, because it is associated (according 
to the findings of stable alignment by Ishihara 2006), yet its position can simply be 
described as aligning as leftmost as possible and associating where the leftmost 
free TBU (mora) is available, resulting in association with the second mora of the 
AP, because the initial one is occupied by the L tone. For Korean, which Maskikit-Es- 
sed & Gussenhoven (2016: 355) take to exemplify this category, it has been found 
that at least the AP-initial peak is also sensitive in its alignment towards the mor- 
phosemantic makeup of the words phrased in it (Kim 2013: 79-84) and so the AP-in- 
itial H can therefore also be thought of as exemplifying category (13)b. Note that 
the difference between the French AP-final H*, which is analyzed as a pitch accent 
because it also affects duration and intensity, and the AP-initial L, which does not, 
is actually not expressible in the typology in (13). This shows that in a language 
arguably without word stress (cf. also Féry 2017: 180—184), accents can also differ 
along this dimension. 


3.5.3.2 Tashlhiyt Berber 

A further relevant case is that of Tashlhiyt Berber as investigated by Roettger (2017). 
Using a statistical assessment of durational and intensity differences between sylla- 
bles as a proxy for prominence in a production study, Roettger (2017: 48-58) finds 
no consistent asymmetries that could be interpreted as evidence for word stress, 
and in particular no evidence that word stress is final as claimed before in the lit- 
erature. His findings on peak alignment are highly interesting because they relate 
variability in peak placement with *phonetic enhancement" of tonal events, i.e. 
when syllables on which tones occur are also longer, louder and produced with 
more peripheral vowels (Roettger 2017: 47, 135, 144). In Tashlhiyt Berber, polar 
questions (both neutral and echo) and statements with a contrastive interpretation 
are realized with a very similar rise-fall (LHL) tonal movement. In questions, this 
is realized on the IP-final word, while in contrastive statements it is realized on the 
word which is contrasted against an alternative in the context. In cases where the 
contrasted word is phrase-final, these two positions obviously fall together. Yet the 
utterance modalities also differ in terms of their peak alignment in two additional 
ways:*? on the one hand, quasi-discretely, such that peaks in statements are bimo- 
dally distributed with one mode towards the end of the penult and the other some 


45 They also differ in terms of global pitch level over the utterance and local pitch excursion on 
the pitch peak, with scaling in both cases such that polar questions > echo-questions > (contrastive) 
statements, a difference that also proves robust in perception (Roettger 2017: 76—79, 93). This re- 
flects a crosslinguistically frequent tendency (cf. the works cited in Roettger 2017: 65). 
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way into the final syllable of the word bearing the pitch peak, while in questions, 
peaks are unimodally distributed with a majority (but by no means all) located 
inside the final syllable (Roettger 2017: 80-81). On the other, they also differ grad- 
ually in that even only within the tokens that realize the peak on the final sylla- 
ble, statements align the peak earlier than questions. In a perception experiment 
only the former, but not the latter, difference significantly affected ratings as either 
statement or question, and with a considerable amount of variability both between 
and within test subjects (Roettger 2017: 92—95). The syllable on which the peak was 
realized was phonetically enhanced in so far as it was louder and longer, and also 
sounded more *prominent", although this was not systematically tested (Grice et al. 
2015: 249, 254; Roettger 2017: 85, 135). In addition to utterance modality, peak place- 
ment is also affected by sonority differences between the syllables making up the 
word. Tashlhiyt has few vowels (/i a u/) but many and diverse consonants, all of 
which, including voiceless stops, are allowed as syllable nuclei, resulting in a large 
number of vowelless words in the lexicon and comparatively long vowelless strings 
in actual speech (Roettger 2017: 37—38). Grice et al. (2015) found in a study on bisyl- 
labic words that peak placement was attracted to the syllable containing the more 
sonorous nucleus, showing a preference both for vocalic nuclei over consonantal 
ones as well as for more over less sonorant consonants, with liquids > nasals > voiced 
obstruents > voiceless obstruents (thus, e.g. /tu.gl/ “she hanged” and /tr.ks/ “she hid" 
were more likely to realize the peak on the penult, and /tn.za/ *it was sold" and 
[nf.d*g/ “I learnt" more likely to realize it on the final syllable, cf. Grice et al. 2015: 
248-251, 263). It also showed a preference for heavier (with a coda) over lighter 
syllables and a general preference for being rightmost; all these factors (including 
utterance modality) seem to be independent of each other and probabilistic rather 
than categorical: only when several of them came together in the same direction 
were counts of 10096 reached for peak placement in one of the two syllables, but not 
by all speakers, and with within-speaker variability present even in the exact same 
condition on the same word (cf. Grice et al. 2015: 251; Roettger 2017: 86-88). The 
situation is exacerbated further when words are considered that contain no sono- 
rants, with either voiced (e.g. [tb.dg/ *she was wet") or unvoiced obstruents in the 
syllable nuclei (e.g. /tk.ff/ *it dried"): in those cases, the peak can either a) not be real- 
ized at all, or b) it is realized on the word preceding the non-sonorant target word, 
or c) it is realized on a schwa-like vowel inserted somewhere in the word (Grice 
et al. 2015: 254-255; Roettger 2017: 98-101). These three alternatives were found 
regardless of utterance modality, but the presence of an inserted schwa in the target 
word and the realization of the peak on the preceding word (b) never occurred 
together in a word consisting only of voiceless obstruents, i.e. when a schwa was 
inserted in such a word, it always carried the tonal peak (Roettger 2017: 100). Roett- 
ger proposes that there are two types of schwa in Tashlhiyt: one is purely the result 
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of articulatory timing, when two constricting gestures are timed far enough apart 
that a transitional period occurs (*gestural underlap") during which airflow can 
pass unhindered, resulting in a vowel-like sound (a “transitional vocoid") as long as 
the vocal folds are vibrating (Roettger 2017: 106—108). Such an account is of course 
insufficient for words consisting only of voiceless segments, since no phonation will 
occur. For those, he proposes a true epenthetic vowel, a schwa that is postlexically 
inserted in voiceless environments precisely in order to act as a TBU (Roettger 2017: 
125-128). Its distribution is not categorically determined, and seemingly affected 
by a variety of heterogeneous factors, including sociolinguistic ones, that do not 
correlate with peak alignment variability (Roettger 2017: 113-114, 124). The differ- 
ence in status between the two schwas accounts for the fact that in voiceless words, 
a schwa never cooccurs with the peak realized on the preceding word: because this 
is the postlexical epenthetic vowel inserted to serve as TBU, whereas in words with 
voiced obstruents, the transitional vocoid can occur but not serve as TBU because it 
is invisible to the phonology (cf. Roettger 2017: 126-128). 

Roettger (2017) does not offer a full phonological analysis of Tashlhiyt intona- 
tion, because only certain aspects of its system were considered. However, he takes 
the discrete variability in stable alignment to different syllables depending on the 
factors outlined and the *phonetic enhancement" ofthe syllables the peak is aligned 
with as evidence that the H tone is not just aligned, but associated with the syllable 
it occurs on (Roettger 2017: 144). Note that this is not the same as saying that these 
syllables bear word stress, which is clearly not the case (since there are realizations 
of the same lexical item by the same speaker under the same experimental condi- 
tions, in one of which the peak is located on the penult, while in the other it is on 
the final syllable, and that syllable is phonetically enhanced in both cases); thus the 
Tashlhiyt case is further evidence that the determination of a culminative position 
at the lexical level and the acoustic correlates of increased duration and intensity, 
usually associated with stress accent, are orthogonal to each other (we have already 
seen that Japanese is in a sense the inverted case). It also seems that this behaviour 
is conditioned by the type of word the tonal movement is realized on and/or the 
utterance type: in a study on wh-words (question words) in Tashlhiyt, Bruggeman 
et al. (2017: 20-21) found that essentially the same rise-fall (LHL) contour is real- 
ized on them as on the final word in polar questions and the contrasted word in 
contrastive statements, but that while the peak is always aligned within the ques- 
tion word, no evidence for systematic alignment with any syllable nor for phonetic 
enhancement of the syllable realizing the peak could be found, making an analysis 
in terms of association with a TBU unlikely. For both the question words in Brug- 
geman et al. (2017) and the tone-bearing words in Roettger's (2017) study, second- 
ary association to a higher prosodic constituent is also proposed: for the question 
words and the contrasted words in statements, this association is with the prosodic 
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word, for polar questions, it is with the IP (Roettger 2017: 137—140). Tones are also 
aligned with the right edge of the domain they are associated with. 

Despite the rather complex situation in Tashlhiyt and several unsolved puz- 
zles,*® it seems that parallels can actually be drawn to the French case. In both 
languages, an H tone (the AP-final one in French) seeks to occur rightmost in its 
domain and will associate with an available legitimate TBU. Assuming the account 
of schwa by Féry (2017), both languages have two different types of schwa, only one 
of which can serve as a TBU. While in French, that is essentially the only additional 
constraint able to prevent a rightmost placement of the H tone, in Tashlhiyt the seg- 
mental environment is much more adverse to tonal realization, resulting in cases 
in which the rightmost available TBU is actually placed before the target word. 
What a legitimate TBU is is thus affected by different factors in the two languages, 
but the generalization that the tone seems to almost occur as rightmost as possible, 
everything else being equal, makes a good case for the adequacy of the alignment 
constraints as proposed in the Gussenhoven model (cf. Grice et al. 2015: 260 for this 
argument, but Roettger 2017: 143-144 for a more sceptical view). The Tashlhiyt 
H tone in polar questions and contrastive statements would thus be classed, like 
the French AP-final H* pitch accent, under category (13)b, while its counterpart in 
wh-words would fall in category (13)d. 


3.5.3.3 Ambonese Malay 

Let us make a final comparison with another language without word stress, 
Ambonese Malay as discussed in Maskikit-Essed & Gussenhoven (2016). Using dura- 
tional and peak alignment measurements on target words from a task eliciting 
small read dialogues, they make the case that Ambonese Malay also has no word 
stress. In a comparison with data from a similar task made with Dutch speakers, 
where stress is uncontroversial, they find that peak alignment in Ambonese Malay 
does not correlate strongly with any segmental position in the word, especially not 
one in the penult, which has been proposed to bear stress in previous studies based 
on impressionistic analyses; the peak effectively varies relatively freely in its place- 
ment on the last two syllables (Maskikit-Essed & Gussenhoven 2016: 364—366). This 
leads them to propose that the H tone is aligned rightmost, but remains unasso- 
ciated, thus belonging to category (13)d. Note that even though we have already 
mentioned above that the Gussenhoven model does not allow for association with 
higher prosodic constituents, Maskikit-Essed & Gussenhoven (2016: 382) state that 


46 The most theoretically challenging one of which is perhaps that the Tashlhiyt data do not in any 
way support a deterministic relation between form and function in intonation, an issue which is 
discussed in Roettger (2017: 145-148). 
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*words are referred to as domains within which the rising-falling pitch movement 
is placed", because the peak always occurs within the target word and because its 
distance from the word edge correlates with the duration of the final two syllables 
as well as of the word itself, more so than in the measurements on Dutch. If “refer- 
ence to a domain" is not to be proposed as a third, as yet somewhat vaguely defined, 
means of tune-text linking (besides association and alignment), then this must be 
taken as a covert admission of association to higher-level prosodic constituents. In 
fact, this would not be a surprising finding if Grice et al. (2000: 148) turn out to 
be right in suggesting that (secondary) association to a TBU rather than a higher 
domain is typologically more frequent. 

Intonation in Ambonese Malay seems also not to signal prominence in contras- 
tive discourse contexts in terms of pitch scaling or any other means” (cf. Maskikit-Es- 
sed & Gussenhoven 2016: 372-374, this is also in line with results from prominence 
perception experiments on the related language Papuan Malay by Riesberg et al. 
2018), and as far as their data allows them to say, the only functional differentiation 
intonation makes in that languages is one between declaratives, with a final low 
boundary tone, and non-declaratives (polar questions and continuation rises), with 
a final high boundary tone (Maskikit-Essed & Gussenhoven 2016: 377—380). 

We have thus seen that also in languages without word stress, there is con- 
siderable variation in intonational systems in terms of association and alignment 
of different tones and their relation to various functions such as the signaling of 
prominence. It seems that the best measurable indicators for association are stable 
phonetic alignment with a consistently identifiable position, on the one hand, and 
*phonetic enhancement", on the other. However, phonetic enhancement is certainly 
not a necessary condition for association, as at least Japanese shows. In addition, 
neither is evidence for word stress: only when they indicate patterns that consist- 
ently point to a unique position in each lexical word can that connection be made. 
In absence of such a pattern, a postlexical pitch accent hypothesis is at least as likely, 
and more so when peak placement is discretely variable, as in Tashlhiyt Berber and 
French. Table 5 attempts to give an overview over the interrelation between asso- 
ciation, phonological and phonetic alignment to lexically/morphologically specified 
positions in a word or postlexically determined ones in a larger prosodic unit, pho- 
netic enhancement and word stress or lexical accent via tones from various lan- 
guages discussed in this section. Two cells contain no data. For a type of tone to be 
entered there, it would need to exhibit continuously variable alignment that spans 


47 The results in Maskikit-Essed & Gussenhoven (2016: 372-374) show some variation between the 
speakers, in that for 2 out of the 4 speakers, the difference between focus conditions was actually 
significant, but with very small effect sizes (0.23 semitones on total average). In general, their small 
study size certainly leaves room for doubt until further results corroborate their findings. 
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Table 5: Attested possibilities for tonal association and alignment from some intonational systems. 


phonetic enhancement no phonetic enhancement 


Association lex./morph. starred tone(s) of pitch starred tone of pitch accent on a lexical 
to a TBU - specified accent on a stressed accent position (e.g. Japanese, Turkish 
stable phonetic position Syllable (Germanic (Levi 2005)), H phrase tone in EEQT in 
alignment with and Romance, Cuzco Standard Greek (sec. assoc. to final 
a segmental or Quechua (?)) stressed syl according to Grice et al. 
Syllabic landmark 2000) 
postlex. Tashlhiyt H in rise-falls — Japanese AP-initial H (secondarily 
determined (Roettger 2017); AP-final associated to a mora, according to 
position H in French (Welby Pierrehumbert & Beckman 1988, closely 
2006) aligned according to Ishihara 2006), 


H phrase tone in EEQT in Standard 
Hungarian (sec. assoc. according to Grice 
et al. 2000); AP-initial L in French (Welby 
2006); AP-initial H in Korean (Kim 2013); 
some phrase-final boundary tones (?) 


Phonological lex./morph. - some leading or trailing tones of a 
alignment to specified bitonal pitch accent on a stressed 

a constituent position Syllable (Germanic / Romance) 

edge (including ^ postlex. - some phrase-final boundary tones 
anothertone)- ^ determined (?); H phrase tone in EEQT in Standard 
variable phonetic position Hungarian according to Gussenhoven- 
alignment across style analysis; H tone in Ambonese Malay 
landmarks (Maskikit-Essed & Gussenhoven 2016), 


H phrase tone in EEQT in Cypriot Greek, 
AP-initial H in French, AP-final L in French 
(Jun & Fougeron 2002; Welby 2006), 

H tone in Tashlhiyt Berber wh-words 
(Bruggeman et al. 2017) 


at least two syllables, and at the same time there would have to be evidence that 
whatever syllable the tone is realized on is phonetically enhanced. That is concep- 
tually not impossible, but perhaps not overly likely. Spanish in general contributes 
to the data resumed in the table, but Quechua mostly represents a large gap in our 
knowledge in this respect. I have tentatively included Cuzco Quechua among those 
languages where phonetic enhancement occurs on the stressed and pitch accented 
syllable, but this has not been conclusively shown (O'Rourke 2009: 298—299). 

For Huari Quechua Quechua and Spanish I will present an analysis in section 
6.1.6 that would allow us to locate them on this table and weigh in on the question 
of word stress in Huari Quechua. Since vowel length is contrastive in this Quechua 
variety (unlike in Cuzco Quechua), it should be less likely that duration will also be 
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exploited strongly to provide phonetic enhancement for pitch accented positions. 
This seems to be corroborated by the results in Hintz (2006: 507), but see the find- 
ings in section 6.1.6 for a qualification. 


3.6 Recursive prosodic structure 


In this section, we will revisit the hierarchy of prosodic constituents initially intro- 
duced in section 3.2. We will consider arguments and evidence from several lan- 
guages that the prosodic units above the prosodic word can have a recursive struc- 
ture. In section 5.2 on complex Huari Spanish utterances, I will argue that the data 
presented there is also best analyzed as representing a recursive prosodic struc- 
ture. The discussion about recursive prosody and cues to it ties in both with that 
about tonal scaling and which role it plays in conveying phonological tones and 
prosodic constituents from section 3.4.2, as well as that about the role of prosody in 
the cueing of information structure, which we will come to in section 3.7. 


3.6.1 The Strict Layer Hypothesis and prosody-syntax mapping 


In the early days of the development of the prosodic hierarchy, the following prin- 
ciples were thought to universally hold for it, besides the hierarchy and its levels 
being themselves universal: 


(14) Architectural principles of the prosodic hierarchy (Nespor & Vogel 2007: 7) 
Principle 1. A given nonterminalunit of the prosodic hierarchy, X; is composed 
of one or more units of the immediately lower category, Xj. 

Principle 2. A unit of a given level of the hierarchy is exhaustively contained 
in the superordinate unit of which it is a part. 

Principle 3. The hierarchical structures of prosodic phonology are n-ary 
branching. 

Principle 4. The relative prominence relation defined for sister nodes is such 
that one node is assigned the value strong (s) and all the other nodes are 
assigned the value weak (w). 


Principles 1 and 2 have also been formulated by Selkirk (1984: 26-27) under the 
heading Strict Layer Hypothesis (SLH). It claims that the prosodic hierarchy is 
strictly ordered and non-recursive: constituents of level n; exhaustively dominate 
all constituents of level n; in their domain and are themselves exhaustively dom- 
inated by a constituent of level n;4, with the index on n being prohibited to skip 
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a number, so that structure (15)a is well-formed, while structures (15)b-d are not 
(deviant elements in bold*9): 


(15) Possible and impossible prosodic structures according to the SLH 
a. [s[c[u[z 6 olz [o olzlulcle 
b. *[,[zo o]z[zo 0]; o], 
c. *[ololz 0 Oslo Olslulo 
d. *Lolwlolclole o a];[z0 O)s]ulcl olirle 


While structure (15)a conforms to all stipulations of the SLH, (15)b does not, since 
it contains a syllable (o) that, while being contained inside a phonological word 
(w), is not contained within its next-higher constituent level, the foot (X). The level 
of the foot has been skipped. (15)c, on the other hand, is disallowed because it is 
recursive: a phonological word is contained within another phonological word. (15) 
d, finally, is out because in it a constituent of the level of the intonational phrase 
(IP) is dominated by a lower-level constituent, that of a phonological phrase (0). 
Claims for the attestation of structures like (15)b-d have all been variously made in 
the literature, but we will be mainly concerned with (non)-recursive structure here. 
Principle 3 has to be understood at least to some extent in the context of principles 
1 and 2: clearly, if there was a restriction to binary branching of constituents (as 
there still is, for example, in Liberman & Prince 1977), it would be impossible to 
maintain principles 1 and 2 with the same set of prosodic levels and without ques- 
tionable results, such as having to posit two different prosodic domains for white 
rabbit ((16)a) and white fluffy rabbit ((16)b), while the problem does not arise with 
n-ary branching ((16)c). 


(16) Differences between only binary and n-ary branching 
a. [c[, white], [„rabbit].]c 
b. [olclawhite] uc [clufluffy]., [rabbit] cle 
c. [pfc [young], [white], [fluffy], [little], [,rabbit],]cls 


There is another important consideration influencing the positing of principle 
3, however: the flatter structures created in this way are to reflect the finding 
that while there is a certain degree of correspondence between morphosyntactic 
structure and prosodic phrasing, the constituents of the prosodic hierarchy are 


48 In order to hopefully improve legibility I use double marking of brackets in this section, using 
the same symbols for the prosodic constituents as described in section 3.2 for subscription. Thus, 
[XYZ], marks one phonological phrase. 
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decidedly not isomorphic with morphosyntactic ones (Nespor & Vogel 2007: 1-2). 
This is even held to be true for the phonological phrase, which is at the same time 
considered to be the most important interface category with the morphosyntactic 
derivation (Nespor & Vogel 2007: xx-xxi). In some subsequent work, this stipula- 
tion of non-isomorphism between syntax and prosody has been moved away from 
to a certain degree and instead, a close correspondence between the two levels 
has been espoused as the norm that can occasionally be deviated from. This is 
reflected for example in how the following optimality-theoretic constraints are 
formulated: 


(17 MATCH-constraints (Selkirk 2011: 439) 
i. Match clause 
Aclauseinsyntactic constituentstructure mustbe matched by a corresponding 
prosodic constituent, call it t, in phonological representation. 
ii. Match phrase 
A phrase in syntactic constituent structure must be matched by a cor- 
responding prosodic constituent, call it p, in phonological representation. 
iii. Match word 
Aword in syntactic constituent structure must be matched by a corresponding 
prosodic constituent, call it w, in phonological representation. 


Here, *matching" means that both the left and right edges of the prosodic constitu- 
ent coincide with those of the morphosyntactic one. The idea of close mapping has 
taken such hold that for instance in Féry (2017: 36), the prosodic levels above the 
foot are introduced with the ‘corresponding’ syntactic units as an indicator of their 
size; the prosodic word “corresponds roughly to a grammatical word", the prosodic 
phrase to a syntactic phrase (NP, VP, PP, AP etc.), the intonational phrase to a clause 
and the utterance to a paragraph. It is somewhat unfortunate for those wishing 
to assess this correspondence claim that ‘paragraph’ is a notion not regularly dis- 
cussed in syntax, and that even 'clause', which is much more ubiquitous, lacks a 
definition relevant for prosody-syntax interactions in *current syntactic theory", as 
Myrberg (2013: 94) admits. Furthermore, it is clear that “corresponds roughly” must 
here in fact often mean ‘does not correspond at all’ when considering that utter- 
ances in actual conversation often consist of syntactic fragments only a few words 
long; nevertheless, they are utterances and intonational phrases: it is well-known 
in prosody research that recordings of words pronounced in isolation should not be 
used to draw conclusions about word-level prominences precisely because they are 
then always pronounced within their own intonational phrase, cf. Jun & Fletcher 
(2014: 495); Féry (2017: 179). Evidence from ellipsis phenomena also underscores 
the point that a close mapping to morphosyntax can be only one of several compet- 
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ing constraints affecting prosodic phrasing.?? This insight is captured in the nature 
of these constraints as violable, as usual for OT. More recent studies have also refor- 
mulated the SLH in a family of OT-constraints. Selkirk (1996: 189-190) proposes 
the following four constraints, where C;is a prosodic constituent of level i, PWd is a 
prosodic (phonological) word, Ft is a foot: 


(18) SLH as violable OT-constraints (Selkirk 1996: 189—190) 
(i) | LAYEREDNESS No C; dominates a Cj j > i, e.g. “No o dominates a Ft." 
(ii) HEADEDNESS Any C; must dominate a C; (except if C; = o), e.g. “A PWd 
must dominate a Ft." 
(iii) EXHAUSTIVITY No C; immediately dominates a constituent C; , j < i-1, e.g. 
*No PWd immediately dominates a o." 


49 Consider the following short dialogues. In i), As monosyllabic utterance would be uttered with 
question intonation, i.e. IP-level boundary tones. B's answers are equally full IPs but of varying 
syntactic status. That B can felicitously answer with a-d in i) but only a in ii) makes an account of 
A's question as an elliptical version of a full question sentence doubtful because of question-answer 
congruence (but cf. Gretsch 2003 for the same observation and a syntactic account). The "elliptical" 
question by A encoded by the question intonation is clearly underspecified in a way which a “full” 
syntactic version can never be, either as a wh-question (Where are the keys? — Here / # Yup) or a 
polar question (Do you have the keys? — Yup / Here / # My pockets). In iii), it is clear that syntactic 
accounts can hardly explain the unfelicitousness of B's c. compared to a. A's question is here spec- 
ified by the context (with a strong bias for expecting the answer to be epistemically accessible to 
the interlocutors but not the speaker) in a different way to how a "full" syntactic question would 
be specified (Who has the keys? - You [declarative intonation]). The constraints for prosodic well- 
formedness at the level of the IP are therefore primarily context-dependent and pragmatic, such 
that information which is sufficiently encoded to provide a context update given the discourse 
context can form an IP, but not less information (or more, according to the *exactly one idea"-pro- 
posal by Chafe 1994). 
i Context: couple leaving their flat together 

A: Keys? [question intonation] 

B: a. Here. b. Yup. c. Got them. d. My pockets [declarative intonation] 
ii Context: the couple have just closed the door behind them after leaving the flat and often forget 

to turn off the lights when leaving 

A: Lights? [question intonation] 

B: a. Yup. b. # Here. [declarative intonation] 
iii. Context: three flatmates sharing one pair of keys leaving their flat together 

A: Keys? [question intonation] 

B. a. Me. b. You got them. c. # You. [declarative intonation] d. You [insistent intonation] 


Here and elsewhere, the hash (#) before examples is used to indicate that the example is well- 
formed (grammatical) in the language, but infelicitous/pragmatically odd in the given context. The 
asterisk (*) in the same position indicates that the example is not well-formed (ungrammatical), in- 
dependent of context, according to what is assumed about the grammar of the language in question. 
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(iv) NONRECURSIVITY No C; dominates C, , j = i, e.g. “No Ft dominates a Ft." 


For Selkirk (1996: 191), EXHAUSTIVITY and NONRECURSIVITY are violable con- 
straints, so that level skipping as in (15)b and recursive structures as in (15)c can be 
allowed in some languages if other constraints are ranked higher. The first two con- 
straints however, LAYEREDNESS and HEADEDNESS, are inviolable universally (i.e. 
in all natural languages) in her conception.?? This is also maintained in Myrberg 
(2013), but she proposes to further complement EXHAUSTIVITY and NONRECUR- 
SIVITY with a slightly different constraint, EQUALSISTERS: 


(19 EQUALSISTERS (Myrberg 2013: 75) 
Sister nodes in prosodic structure are instantiations of the same prosodic 
category. 


EQUALSISTERS is argued by Myrberg (2013: 78) to be able to explain preference pat- 
terns for prosodic structures in Stockholm Swedish which EXHAUSTIVITY and NONRE- 
CURSIVITY cannot differentiate between (the latter but not the former being violated 
uniformly in all of them). She argues that these structures are recursive (with an into- 
national phrase containing one or two intonational phrases) based on the distribution 
of IP-initial and IP-final as well as lexical and focal pitch accents and their downtrend 
behaviour (Myrberg 2013: 98-100, 108—110), i.e. on observable empirical grounds. 


3.6.2 Empirical evidence for recursive prosodic structure from systematic 
pitch scaling 


While some of the arguments for recursive prosodic structure in the literature are 
clearly motivated by maintaining a tight mapping between syntax and prosody and are 
principally theoretical, there is another line of arguments for it that is based on empir- 
ical findings and can be evaluated independently of assuming such a tight mapping. 
They mostly come from findings about pitch scaling, which is an area that AM is still 


50 Féry (2015) discusses examples from non-extraposed German relative clauses that she calls 
*prosodic monsters" and which she argues to also violate Layeredness (i.e. structures like (15)d) by 
containing an IP within a phonological phrase. An important point she wishes to make is that syn- 
tax and prosody interact as equals: syntactic and prosodic constraints must be in the same ranking 
hierarchy in order to account for the acceptability differences between the examples she gives. 
However, her analysis of the prosodic phrasing is based only on a close mapping between syntactic 
and prosodic categories and she does not provide any measurable intonational evidence against 
alternative analyses with less complex prosodic structure and a looser mapping. 
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struggling to gain a solid theoretical grasp on (cf. section 3.4.2). We will review some of 
those that argue for a recursive category at the top of the prosodic hierarchy (IP) in the 
following. Terminologically, *downstep" will be used to label a process analyzed as pho- 
nological and resulting in a lowering of pitch level between two successive units; *dec- 
lination" is the phonetically (physiologically) determined lowering of pitch over longer 
stretches, and *downtrend" is used to describe either of these without committing to 
an underlying cause. *Upstep" is the counterpart of downstep in the other direction. 


3.6.2.1 Stockholm Swedish 
Myrberg (2013) argues for prosodic recursion based on Stockholm Swedish data 
such as the utterance shown in Figure 11. Swedish has a lexical pitch accent so that 
content words belong to one of two accent classes, called accent I and accent II. IPs 
are delimited at their left edge by a so-called “initiality accent” located on the first 
prosodic word, L*H on an accent I word, and H*LH on an accent II word. At their 
right edge, they are delimited by a high or low boundary tone (H% or L%). The most 
prominent phonological phrase (PP here) in an IP gets assigned a “focal accent" 
on the rightmost prosodic word which is almost of the same form as the “initiality 
accent”. The “initiality accent" is argued not to be the same as the “focal accent" 
because unlike the latter, its position is only determined by the position in the IP 
and not at all by information structural criteria, appearing also on backgrounded 
and given words. Its final H is also aligned later and more variably than that of 
the *focal accent" (Myrberg 2013: 85-87, cf. also Horne et al. 2001; Roll et al. 2009). 
In utterances like the one shown in Figure 11 from a corpus of read utterances by 
three speakers (n-27, cf. Myrberg 2013: 93-94), the tonal movement realizing the 
initiality accent is produced twice (on andra and Anna), and that of the focal accent 
as well (on utklädda, which has a secondary stress on the second syllable leading to 
the association of the L tone as L*, and on med). The focal accent pitch movements 
are both followed by clear low valleys indicating an L% tone at the right boundary 
ofthe IP. Thus judging from the presence of these initial and final pitch movements, 
it is clear that these examples consist of two IPs, with each extending over one 
of the main clauses making up the coordinated syntactic structure (see the lower 
part of Figure 11). These utterances however all (Myrberg 2013: 98) also display a 
steady downtrend in the relative scaling of their local pitch maxima and minima, 
as shown by the dashed lines in the pitch track in the upper part of Figure 11. The 
fact that this downtrend continues across the utterance and that no or only partial 
pitch reset takes place at the boundary between the two IPs is taken to be evidence 
for the presence of a third, larger, IP which fully contains the two smaller ones. 
Pitch reset normally occurs at the boundary between IPs in Swedish (Myrberg 
2013: 108—109), and is also taken as a universal cue for IPs by Himmelmann et al. 
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H*LH H*L*HL% H*LH L*H L96 
Hl IA FA]. s; (IA FA venie je 


[[De 'andra skulle vara 'utkládda],  sá[Anna ville inte vara 'med]. Jita 


Figure 11: Pitch track of a Stockholm Swedish utterance, de andra skulle vara utkládda så Anna ville 
inte vara med "the others were getting dressed up, so Anna didn't want to join", with dashed lines 
indicating downtrend of the pitch maxima and minima over the time course (a), and its proposed 
phrasing as two coordinated minimal IPs in a maximal IP (b), adapted from Myrberg (2013: 97, 99). 


(2018). That it does not occur in the Swedish examples is thus evidence that as a 
whole they consist of a constituent in whose domain reset is suspended, i.e. an IP. 


3.6.2.2 British English 

A very similar argumentation has been brought forward by Ladd (1986, 1988, 1990, 
1993, 2008: 294-297) based on evidence from somewhat more complex English 
data. Ladd (1988) presents read speech data from speakers of British English in 
two conditions, the “A and B but C" condition and the “A but B and C" condition, 
exemplified by (20)a and (20)b, respectively. Their hierarchical structure?! is given 
in (20)c, intended to show that the clauses connected by and are sister nodes under 
a higher constituent, while the clause preceded by but is a sister node at the level of 


51 Note that Ladd (1988: 531) argues for the hierarchical asymetry in clause status and boundary 
strength based on semantic and pragmatic considerations, not on syntactic ones: *The most natural 
interpretation of these sentences appears to be one in which the but opposes one proposition to 
the two conjoined by and [. . .]”. The experimental materials from all the studies in this section can 
be interpreted in this way, suggesting that hierarchical pitch scaling at least partially serves the 
function of cueing discourse cohesion and coherence. 
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that constituent. This leads to the intuitive understanding of the boundary before 
but as stronger than the one before and in these constructions, which Ladd (1988: 
531) hypothesizes is reflected in pitch scaling. 


(20) “A but B and C" and “A and B but C” sentences from Ladd (1988: 532) 
a. Allen is a stronger campaigner, and Ryan has more popular policies, but 
Warren has a lot more money 
b. Ryan has a lot more money, but Warren is a stronger campaigner, and 
Allen has more popular policies 


A and B but C A but B and C 


When read by Ladd's test subjects, the sentences were consistently produced with 
several pitch accents on the prominent words and a final falling boundary move- 
ment at the end of each clause, indicating that each clause is produced as an IP 
(Ladd 1988: 532). In two experiments, the first one involving examples with three 
accented words per clause (4 speakers x 18 sentences x 2 conditions = 144 utter- 
ances), the second with four accented words per clause (same number of utter- 
ances), peak measures taken for the accents showed a downtrend across each IP 
corresponding to an individual clause, but with partial, not full, reset between 
the IPs, indicating them all to be part of a larger constituent in which full reset is 
suspended (Ladd 1988: 535, 539). Average peak measurements on the first accent 
following a but-houndary were higher than those on the corresponding accent 
in the other condition, when it followed an and-boundary, and this difference in 
boundary strength was also mostly supported by durational measurements at the 
boundaries (Ladd 1988: 535—536, 539). These differences were shown to be statisti- 
cally significant in the second experiment for all speakers (significant interaction 
in ANOVA of sentence type x clause type x accent and/or sentence type x clause 
type), and for three in the first (Ladd 1988: 535, 539). A third experiment served 
as a control to ensure that it is not a local effect of but which is responsible for the 
observed scaling differences, but indeed the hierarchical structure. 


3.6.2.3 Northern German 

Truckenbrodt & Féry (2015) (see also Féry & Truckenbrodt 2005) essentially repli- 
cate Ladd's (1988) study with German data, using similar experimental conditions 
in a reading task with 5 northern German speakers as subjects. 
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(21) Example sentences for the three experimental conditions from Truckenbrodt 

& Féry (2015: 25-27), underlined syllables are expected to be accented 

a. AX condition (Ladd's A but B and C condition): A während [B und C]x 
Context: Warum meint Anna, dass Handwerker teurere Autos haben als 
Musiker? 

‘Why does Anna think that craftsmen have more expensive cars than 
musicians?” 

Sentence: Weil der Maler einen Jaguar hat, während [die Sängerin einen 
Lada besitzt, und der Geiger einen Wartburg fahrt] 

“Because the painter has a Jaguar, while [the singer owns a Lada and the 
violinist drives a Wartburg] 

b. XC condition (Ladd's A and B but C condition): [A und B]y während C 
Context: Warum meint Anna, dass Musiker teurere Autos haben als Sportler? 
"Why does Anna think that musicians have more expensive cars than 
Sportsmen? 

Sentence: [Weil die Sängerin einen Jaguar hat, und der Geiger einen 
Daimler besitzt], während der Ringer einen Lada fährt 

*'[Because the singer has a Jaguar and the violinist owns a Daimler], while 
the wrestler drives a Lada' 

c. No-Xcondition (control condition) 

Context: Warum meint Anna, dass ihre Nachbarn teure Autos haben? 
‘Why does Anna think that her neighbours have expensive cars? 
Sentence: Weil Möller und Hummel einen Jaguar haben, Meyer und Lerner 
einen Daimler besitzen, und Wollmann und Lehmann einen BMW fahren 
“Because Móller and Hummel have a Jaguar, Meyer and Lerner own a 
Daimler, and Wollmann and Lehmann drive a BMW? 


As the examples in (21) show, their experimental sentences all create an argumen- 
tative contrast between two propositions one of which, X, is itself a coordination of 
two propositions. Each proposition is expressed through an individual clause, the 
two coordinated ones separated by und “and”, and the contrasting one separated by 
während “while”. In the control condition, no argumentative contrast exists. As in 
Ladd (1988), each clause is realized with several pitch accents (here, one prenuclear 
one and a nuclear one on the final accented word in the IP, both L*H) and a separate 
final boundary tone (H% at the end of the prefinal and L% at the end of the final IP) 
and thus analyzed as forming IPs (Truckenbrodt & Féry 2015: 27-29). One important 
difference to Ladd's experiment is that because in German, the nuclear accent in an 
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IP is upstepped (Féry & Kügler 2008), the scaling of this upstep?? is also hypothesized 
to be affected by the proposed hierarchical prosodic structure (Truckenbrodt & Féry 
2015: 24), in addition to the scaling on the IP-initial accents. Their results confirm 
Ladd's in that the difference in boundary strength is reflected in the scaling of the 
initial accent between the two conditions (see A in Figure 12), and no such effect is 
found in the control condition, which shows downtrend with partial reset across the 
three IPs, as in Swedish ( Truckenbrodt & Féry 2015: 30). In order to model this effect, 
they use the concept of phrasal reference lines, first proposed in van den Berg et al. 
(1992) for Dutch. There it was observed that peak values in downstep contexts seem 
to orient themselves towards an abstract reference line extending across a phrasal 
constituent. Truckenbrodt & Féry (2015: 31-32) analyze their results as showing that 
a phonological process of downstep is responsible for the observed effects. Down- 
step is implemented as a lowering of the respective phrasal reference lines. It only 
applies between sister nodes in the prosodic structure (as also proposed by Ladd 
1988: 541-542), lowering the second sister, and its effects manifest as less strong 
between higher constituents than between lower ones.9?? This is schematically rep- 
resented™ in B in Figure 12. In the AX condition, lowering takes place between the 
IPs A and X (-B + C) and between B and C. In the XC condition, it takes place within 
X, i.e. between A and B, and between X and C, but not between B and C, as they are 
not sisters. The actual pooled values reflect this: the initial peaks in B and C in the AX 
condition are both lowered with respect to the preceding initial peak, whereas in the 
XC condition, they are at the same level, and B is lower here than in the AX condition. 

The results on the upstepped peaks also confirm the general predictions of this 
model but show some additional variation. In the AX condition, upstepped IP-final 
nuclear peaks are roughly at the same height as the preceding initial peaks (no 
significant differences in paired t-tests), but in the XC condition, the height of the 
nuclear accent in the second clause (B), labeled H4 in Figure 13, varies. For two 
speakers, it is significantly higher than the preceding IP-initial peak H3 and not 
significantly different from H1, the initial peak in the first clause, while for two 
others, it is the other way around, with H4 being at broadly the same height as H3 


52 The nuclear accent in the experiment sentences always fell on a lexical item signifying a car 
name in each IP, which was contrasted with a different item of the same type in the other clauses. 
53 Correlating boundary strength with amount of lowering is the proposal by Ladd (1988), re- 
formulated via depth of embedding in Féry & Truckenbrodt (2005). Truckenbrodt & Féry (2015: 
32-33) relate the difference between the initial accents in B in the two conditions to an additional 
stipulation of final lowering (Liberman & Pierrehumbert 1984) applying to the phrasal reference 
line of the last sister under a node. The different proposals do not make any differing predictions 
with regards to the data discussed in Truckenbrodt & Féry (2015). 

54 In reality, the reference lines should presumably not be thought of as horizontal, but as always in- 
cluding some slowly increasing amount of decay, the effect of phonetic declination over any utterance. 
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Figure 12: A: Mean values and 95% confidence intervals for the IP-initial peaks, normalized and 
pooled values of speakers $1-55 for the two conditions AX (n= 67) and XC (n= 66). H1, H3, and H5 are 
the respective initial peaks of the clauses A, B and C, from Truckenbrodt & Féry (2015: 31). 

B: Schematized phrasal reference lines and IP-initial peaks for the two experimental conditions, from 
Truckenbrodt & Féry (2015: 32). 


and lower than H1 (Truckenbrodt & Féry 2015: 35). This is reflected in the average 
value for H4 in the XC condition seen in B in Figure 13. The interpretation is that the 
height of H4 is variably orientated either towards the higher reference line of X, or 
the downstepped one of B in the XC condition, as shown in A of Figure 13. The fifth 
speaker always upstepped H4 almost to the height of H1 in both conditions, point- 
ing once again to the need for further studies with more speakers to illuminate the 
role of inter-speaker variation in these phenomena. 


3.6.2.4 European Portuguese 

European Portuguese is a Romance language for which comparable empirical 
findings about prosodic boundary strength have been connected to a hypothesis 
about prosodic recursion (Frota 2012, 2014). This argument is based on gradual 
differences corresponding to boundary strengths not only with respect to pitch 
scaling, but also phrase-final lengthening and even the likelihood of the application 
of segmental prosodic processes. The IP in European Portuguese, according to Frota 
(2014: 11-13), is the domain of a number of processes, including the following: a) it 
is the domain at which at least one pitch accent is obligatory, on the strongest word 
in the final PhP in the IP; b) it has an obligatory boundary tone at its right edge and 
an optional one at its left edge; c) word-final fricatives are voiced when followed 
by a word-initial vowel within the IP, but voiceless at its edges; d) it is the domain 
of final lengthening; and e) clitic elements have a tendency to be realized in their 
stronger forms when positioned at the left edge of the IP. The distribution of pitch 
accents and boundary tones (a*b) is shown to agree with the phrasing cued by frica- 
tive voicing (c) and final lengthening (d) in (22)a and b and the corresponding pitch 
tracks given in Figure 14. The tonic and posttonic syllables of the words preceding 
an IP-boundary as marked in (22)a and b are lengthened (underlined in the exam- 
ples), a pitch accent is realized on the final stressed syllable followed by a boundary 
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Figure 13: A: Schematized phrasal reference lines and upstepped nuclear peak values for the two 
experimental conditions, from Truckenbrodt & Féry (2015: 34). B: Mean values and 9596 confidence 
intervals for upstepped nuclear peaks H2 (clause A) and H4 (clause B) in relation to the three IP-initial 
peak values Hf, H3, and H5 in the normalized pooled data of speakers S1-54. (a) AX condition (n=51); 
(b) XC condition (n=50), from Truckenbrodt & Féry (2015: 35). 


tone, as evidenced by the pitch movements in Figure 14, and IP-final fricatives are 
realized as voiceless [f], while IP-internally, they are realized as voiced [z]. 

The difference between (22)a and (22)b is proposed to be that while in (22)a, all 
IPs are of equal status, in (22)b the “short IP" (Frota (2014: 11)) as alunas “the stu- 
dents" is embedded within the larger IP extending until até onde sabemos, forming 
a recursive *compound prosodic domain" (Ladd 2008: 297—309; Frota 2012, 2014). 
The short IP is argued not to be an entirely different category, e.g. an intermediate 
phrase, because the differences to a longer IP are mostly only gradual: while fric- 
ative voicing extends throughout the larger IP domain in (22)b (visible also in the 
clearly different spectral quality of the segment corresponding to the fricative in 
B compared to A in the original pitch tracks?? in Frota 2014: 13), final lengthening 


55 Voice onset time (VOT), the phonetic exponent of voicing, is of course also a continuous parame- 
ter. It might turn out that both VOT and the place difference between [z] and [f] as realized in terms 
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Figure 14: Two versions of an elicited European Portuguese utterance, corresponding to the 
proposed phrasing structures in (22)a (A) and (22)b (B). Adapted from Frota (2014: 13). 


and pitch scaling are decreased (but still present) at the right edge of the short IP in 
comparison to the right edges of larger IPs, as exemplified in Figure 14. 


(22) European Portuguese example sentences with different proposed phrasings 
(Frota 2014: 14) 
a. {al[z] alunalf] }p í até onde sabemo[f] }p í obtiveram boa[z] avaliacéelf] r 
b. {{ alz] aluna[z] }p { até onde sabemolf] }p }p {obtiveram boa[z] aval- 
iacóe[f] hr 
“The students, as far as we know, have got good grades" 


For the fifth boundary cue given above, the realization of clitics in their strong or 
reduced form (e), the data presented in Frota (2014) does not directly bear on the 
question of whether the IP is recursive or not in European Portuguese. However, 


of spectral quality are actually also subject to gradual effects of boundary strength in European 
Portuguese utterances like these. 
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the percentages given with the examples in (23) indicate that this is also not a dis- 
crete but gradual difference, which only comes out when quantified: in (23)a, the 
clitic element in question a os “to the” is placed IP-medially, and realized in the 
strong, i.e. diphthongized, form [awf] in 20% of the elicitations, and in the reduced 
monophtongized one [əf] in 80% of them. When the clause a os jornalistas “to the 
journalists" receives its own phrase, either fronted as in (23)b or in the same posi- 
tion ((23)c), which in both cases effects a topical interpretation according to Frota 
(2014: 14), so that the clitic elements are IP-initial, this number goes up to 88% and 
92%, respectively. Although cases testing this are not reported on in Frota (2014), the 
difference between 2096 and around 9096 is large enough to hypothetically allow for 
intermediate prevalences (say, around 55%) that might characterize the behaviour 
of cases that are in-between: at the left edge of the second element in a compound 
IP (comparable to the beginning of até onde sabemos in (22)b), for example. In such 
a position, a clause like a os jornalistas would be adjacent to an initial boundary, 
unlike in (23)a, but this boundary would be weaker than the boundary the clause 
is adjacent to in (23)b and (23)c following Frota's analysis. Should this turn out to 
be the case, it would indicate that boundary strength of recursive domains is also 
expressed by the quantitative prevalence of an otherwise discrete phenomenon.? 


(23) Three European Portuguese test sentences with reported prevalence in 
realization as its strong form [awf] instead of the reduced form [əf] of the 
clitics a os *to the", from Frota (2014: 14) 

a. [a[z] angolana[z] ofereceram especiaria[z] [əf] jornalista[{] Jp 
“The Angolan women offered spices to the journalists' (20% occurrences 
diphthongized) 

b. [[awf] jornalista[{] lp [ a[z] angolana[z] ofereceram especiaria[/] li» 
“To the journalists, the Angolan women offered spices' (8896 occurrences 
diphthongized) 

c. [alz] angolana[z] ofereceram especiaria[f] lp [ [awf] jornalista[[] Jp 
“The Angolan women offeres spices, to the journalists' (9296 occurrences 
diphthongized) 


56 Although this is presumably a simplification. Just as VOT and fricative spectral energy are locat- 
ed on a continuum, so are the differences in formant frequency change between [5] and [aw]. If it 
were the case that non-maximal IP boundaries really resulted in an intermediate rate of monoph- 
tongization, it would be interesting to see whether measurements of these continuous parameters 
also revealed intermediate realizations. This would then presumably be reflected quantitatively as 
differing rates of realization of a discrete variable because of our categorical perception. 
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3.6.2.5 (Tokyo) Japanese 

For (Tokyo) Japanese, Kubozono (1989) reports on findings that again also involve tonal 
scaling. It is usually assumed that the intermediate phrase in Japanese is the domain of 
downstep (Beckman & Pierrehumbert 1986; Venditti 2005), such that the pitch accents 
and initial rise of a second accentual phrase within the same intermediate phrase are 
realized at a lower pitch than those of the preceding one. However, when they are 
grouped into two separate intermediate phrases, the second AP's tones are realized 
atthe same height as that of the preceding one, and reset takes place. In addition, the 
perceived disjuncture between words belonging to two separate intermediate phrases 
is larger than between two APs, e.g. pauses are more likely to occur or to be longer? 
(Venditti 2005: 176, 185—186). Kubozono's (1989) findings show that phrases composed 
of right-branching complex noun phrases containing only accented words (i.e. made 
up of as many APS, since it is usually taken to be definitional of APs to contain a single 
lexical pitch accent, cf. Ito & Mester 2012: 284) such as (24)a, realize a *pitch boost", an 
effect similar or equivalent to upstep, on the second accent, i.e. the one following the 
stronger boundary, while those composed of left-branching noun phrases, such as (24) 
b, do not. This upstep also takes place on the third accent in phrases realizing balanced 
structures, such as (24)c, again after the stronger boundary. 


(24) Japanese test phrases with prosodic bracketing, adapted from Kubozono (1989) 
a. right-branching complex noun phrase” 


laplkowa'ilaplip ælap[me-no]ap— ap[ya'mai]arlir 
terrible eye-GEN disease 
“terrible eye disease” (Kubozono 1989: 42) 


57 Note however that Venditti (2005: 186) calls pauses neither a necessary nor a sufficient condi- 
tion for the perception of a disjuncture of the kind that occurs between intermediate phrases. She 
also points out that there are cases where the perceived disjuncture does not match with the tonal 
cues assumed for a given category. 

58 Kubozono (1989) locates the difference between these three types of sentences in the syntax, 
and assumes that this is then what is reflected in differing prosodic structures. The argument for 
syntactic difference however is only built on the lexical semantics and world knowledge: in kowai 
me-no yamai “terrible eye disease”, kowai “terrible” is only understood to modify me-no yamai 
because that is the most likely interpretation. A marginal interpretation of “the disease of terrible 
eyes” is presumably possible. If kowai were replaced by akai “red”, the noun phrase would most 
likely be understood as “the disease of red eyes" and correspondingly parsed and phrased as [akai 
me-no][yamai], because redness is something more likely to be said of eyes than of an eye disease. 
Similarly, in the other two examples, replacement of one of the lexical items by another of the 
same class, without changing anything in the observable syntactic structure, would result in an 
interpretation with different branching. It would be interesting to see if different kinds of prosodic 
phrasing were available in ambiguous cases, such as a variant on (24)b, kanbashii ayu-no nioi, 
where kanbashii *aromatic" could modify either ayu or ayu-no nioi. 
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b. left-branching complex noun phrase 
iplap[na'mano]ap ar[a'yu-no]aplip— æ[ap[nioi]ap]ir 
raw type.of.fish-GEN smell 
“smell of raw Ayu” (Kubozono 1989: 44) 

c. balanced complex noun phrase 
iplap[na'oko-no]ap ap[ani-no]arlip ælar[aoi]ap  ar[erimaki]ap]ir 
Naoko-GEN older.brother-GEN blue muffler 
“Naoko’s older brother’s blue muffler” (Kubozono 1989: 51) 


Parallel effects of prosodic boundary strength on the scaling of the AP-initial rise 
were found by Selkirk et al. (2003) in a reading task with differing syntactic struc- 
tures. They found that scaling depended on whether the AP was initial or non-in- 
itial in an iP. Based on these findings by Kubozono (1989) and Selkirk et al. (2003), 
Ito & Mester (2012) propose to reduce the two levels of the AP and iP in Japanese 
into one, as recursive instantiations of b, the phonological phrase. In their inter- 
pretation, downstep applies at that level. Because it is definitional for the minimal 
$ (=AP) to contain only one lexical pitch accent (and one phrase-initial accent) and 
downstep only applies between two pitch accents, they argue that the effect of 
downstep is only visible at a recursive instance of a non-minimal 9, i.e. an ọ that 
contains at least another 9, and the scaling of the AP-initial rise is increased with 
increasing levels of recursion of the ọ (Ito & Mester 2012: 286). The same boundary 
strengthening effect is what they invoke to explain the upstep in Kubozono's data 
(Ito & Mester 2012: 294). 

Independent of prosodic recursion, all of these studies provide evidence that 
initial prosodic strengthening (Fougeron & Keating 1997, cf. also 3.2.2) also affects 
pitch scaling and is sensitive to differing boundary strengths, including differences 
that are not predicted by the traditional prosodic hierarchy. They also provide evi- 
dence that pitch scaling, especially in the case of regular downtrend, cannot be 
modeled purely locally, in contrast to what Pierrehumbert (1980); Liberman & 
Pierrehumbert (1984) had suggested, and that it instead seems to make reference 
to a hierarchical organization of larger prosodic units. From there, assuming pro- 
sodic recursion proceeds by two main arguments: for one, the fact that the pro- 
sodic unit of the smaller subparts and that of the whole utterance do not differ 
in their discrete cues (boundary tones or constraints on accent placement within 
them) but only gradually in their boundary strength as cued by pitch scaling of the 
boundary-adjacent movements, duration and degrees of pitch reset, is seen as most 
compatible with an account taking them all as instantiations of the same recursive 
phrasal category (Ladd 1986: 327—328; Frota 2012: 260—261; Ito & Mester 2012: 294; 
Myrberg 2013: 108-110; Truckenbrodt & Féry 2015: 37). Secondly, if the component 
units were taken to be categorically different from the overarching unit based 
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on these gradual differences, then sufficiently long utterances with sufficiently 
complex structure would require postulating a large number of additional cate- 
gories ad hoc, with no principled way of accounting for their similarities (the fact 
that they all share the same boundary phenomena with only gradual differences, 
cf. Ladd 2008: 294-295). 


3.6.3 Separating arguments for prosodic recursion from assumptions 
of universality and a close prosody-syntax mapping 


I would like to restrict arguments for recursive prosodic phrasing to those based 
on the kind of evidence described in the preceding sections, because they require 
the fewest theoretical presuppositions. There are other works invoking recursive 
prosodic structure at a level above the prosodic word for the analysis of their obser- 
vations, but they are based on additional theoretical assumptions. For example, 
Elfner (2015) proposes recursive phonological phrases to account for the distribu- 
tion of phrase-initial LH and phrase-final HL pitch accents in Connemara Irish: the 
HL accent appears on the stressed syllable of the final word in the phonological 
phrase, but the LH accent only appears on the stressed syllable of the leftmost word 
in a ọ that is not minimal, i.e. one that dominates another w (Elfner 2015: 1180, 
1182), similar to the argument in Ito & Mester (2012). While her analysis seems 
well-suited to the data she presents, this sort of evidence is of a somewhat different 
nature: instead of gradual differences between instantiations of what is the same 
prosodic category by all other cues, as we have seen in the other cases discussed in 
this section, in the Irish case under Elfner's proposal, the different recursive levels 
of ọ actually differ categorically, in their tonal makeup (LH initial in all os, LH 
initial and HL final in all non-minimal $s). Bickel et al. (2009); Schiering et al. (2010) 
argue for up to 4 prosodic units close to the prosodic word in Limbu (Sino-Tibetan), 
based on the observation that they all serve as the application domain of quite dis- 
tinct processes. They discuss and explicitly discard the option to solve this puzzle 
via a recursive prosodic word domain, precisely because the domains in question 
do not have the same phonological properties (Bickel et al. 2009: 50). However, the 
necessity for a recursive solution here only even arises when there is some incen- 
tive to apply the same set of prosodic categories universally for all languages. That 
is clearly the case for Ito & Mester (2012: 287—289), who posit recursive versions 
of the prosodic word (w), phonological phrase (b) and intonational phrase (1) as 
universal interface categories with the syntax and who explicitly reject the pos- 
tulation of prosodic categories based only on observable phonetic effects. Elfner 
(2015: 1203) equally voices some optimism that such a universal reduction would 
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facilitate typological comparison because it is “uncontroversial to assume that all 
languages distinguish an intermediate category” between w and L5? 

While this assertion is perhaps premature in the face of reliable empirical 
studies on the prosody of probably less than 596 of all living languages, it shows 
that some of the driving motivation for the proposal of prosodic recursion lies in 
this potential for universalization as well as the close correspondence with syntax. 
If these are not presupposed axiomatically, it is not problematic to assume 4 dis- 
tinct categories resembling the prosodic word in Limbu, for example, if they can 
indeed all be shown to be the domains of distinct processes. The evidence on pro- 
sodic boundary strengthening we have discussed is of a different kind, because if 
recursion is not assumed in those cases then the number of categories of the pro- 
sodic hierarchy will potentially rise indefinitely, assuming that gradual boundary 
differences can still be found in sufficiently long and complex utterances (which 
hasn't been tested so far).® This is conceptually problematic even if no assumptions 
about the universality of prosodic categories or a tight mapping between syntax 
and prosody are made. It is clear in any case that prosodic recursion must have 
its limits, as argued for by both van der Hulst (2010) and Ladd (2008: 297—299): no 
one has ever proposed an intonational phrase inside a syllable, for instance, and 
the relative “flatness” of prosodic structure as compared to the syntactic structure 
seems to be still agreed upon by everyone. In addition to clear evidence of abun- 
dant mismatch between syntax and prosody, there is also evidence that the same 
kind of syntactic embedding does not correlate with similarly recursive prosodic 
structures in different languages: Féry & Schubó (2010) found evidence from down- 
step indicating recursive prosodic phrases in German utterances with syntactic 
center-embedding, but not for similar Hindi utterances. 

In the interest of adequate descriptions for each language under investigation, 
universal claims should be made very cautiously, while the general idea of differ- 
ent prosodic levels that stand in a hierarchical relation to each other and serve as 


59 This is somewhat ironic because for the analysis of her Irish examples, all full utterances, she 
makes no recourse to a second phrasal category (namely v at all, having no need for anything 
above a maximal 9 in terms of unexplained cues or structural analysis. 

60 For Korean, Jun (2007) takes this path: she proposes the intermediate phrase based on findings 
about cues that resemble those for the AP, but in a stronger version (increase in tonal scaling, but 
depending on the AP-initial tones, which are in turn dependent on the AP-initial segments, cf. Jun 
2007: 160—161). Above the iP there is also an IP, which is clearly separate because it has different 
cues (additional boundary tones, cf. Jun 2007: 155). Whether it is more parsimonious to take the 
Korean intermediate level also as a recursive version of the AP, as Ito & Mester (2012) propose for 
Japanese, or as its own category, as Jun (2007) does, is so long a question of theoretical choice as it 
is unclear whether the cue strength for it differentiates only between two versions (AP and iP) or 
indefinitely and correlating with the number of proposed levels of recursive AP-embedding. 
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the domain of phonological rule application should always be kept in mind. Ladd 
(2008: 298-299) proposes a weakened version of the SLH, in which the different 
levels are still ranked, but where compounding of levels is allowed in order to 
reduce the number of categories necessary for the adequate description of each 
language, explicitly with the motivation that this would *allow us to identify any 
given boundary as being of one category or another on purely phonetic and pho- 
nological grounds, without as it were looking over our shoulder at the theoretical 
consequences for prosodic structure or for syntax-prosody mapping". 


3.6.4 What is (not) known about prosodic recursion in Spanish and Quechua 


As far as I have been able to determine, similar empirical studies have not been 
done on Spanish (or on any variety of Quechua). I will review some results from 
related studies on Spanish, but the issue of prosodic recursion based on empiri- 
cal evidence itself so fas has not been tackled. Garrido et al. (1995) attempt to cor- 
relate degree of final lengthening, pitch reset and pre-boundary pitch contours 
in read Spanish utterances with differing syntactic boundaries?! but only find 
non-significant tendencies for final lengthening. Because they treat pitch reset and 
pre-boundary contours only as categorical variables (presence or not of reset anda 
pre-boundary peak), their findings in this respect are not interpretable for the issue 
at hand. Rao (2008) investigates phrasing patterns in Barcelona Spanish, via SVO 
sentences read by 18 speakers that systematically differ in terms of the length (in 
prosodic words) and complexity (depth of syntactic branching) of both subject and 
object. He assumes that under a tight mapping between syntax and prosody and 
following a proposal originally from Ladd (1986), postnominal appositives should 
form a recursive phrase together with their head NP (Rao 2008: 100-101). He also 
provides the pitch track of an individual example where tonal scaling might reflect 
a difference in boundary strength: 

In Figure 15, pitch is scaled clearly higher at the end of the postnominal adjec- 
tival modifier inteligente y gordo de Barcelona than after el Javier, with which it 
forms an NP together. However, Rao (2008: 101) does not claim that this is recursive 
prosodic phrasing, unlike in the case of an appositive. He also does not make any 
statements about whether this scaling difference is an individual occurrence or a 
robust finding in this or similar conditions. Instead, he simply assumes that when- 
ever a phrasing such as [head noun] pyp [appositive]p,p occurs, this constitutes recur- 


61 They do not refer to categories of the prosodic hierarchy, but under a close syntax-prosody map- 
ping their different syntactic boundaries would likely result in recursive phrasing. 
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S Ð Dt TS Jy Lad NR 


verde | 


el Javier| inteligente y 


gordo | de Barcelona | me dona E libro 


Figure 15: Pitch track of a read Barcelona Spanish utterance from Rao (2008: 97), el Javier inteligente 
y gordo de Barcelóna me dona el libro verde *The intelligent and fat Javier from Barcelona donates 
the green book to me”. The phrasing according to Rao 2008 is (El Javier) (inteligente y gordo de 
Barcelona)j(me dona el libro)p(verde). 


sive prosodic PhP phrasing. That cannot be counted as empirical evidence for it. His 
criteria for identifying a PhP boundary (equivalent to an iP boundary) include the 
presence of continuation rises, decreases in pitch range, phrase-final lengthening 
and pauses (Rao 2008: 93), but when presenting the results, only the presence or 
absence of a boundary is marked, and no information on the cues given. Interest- 
ingly, his results show that even in the case of appositives, the “recursive” phras- 
ing is never the only, and sometimes not even the most frequent, pattern (cf. e.g. 
Rao 2008: 107). At a broader level, they show that syntactically motivated phrasing 
constraints aiming to map relevant syntactic boundaries to PhP boundaries can 
frequently be overridden by prosodic constraints aiming to create balanced struc- 
tures or those conforming to minimality or maximality constraints on the PhP, 
and that “as syntactic branching of utterances increases, prosodic concepts seem 
to play a more crucial role than syntactic conditions in determining the parsing 
of phrases" (Rao 2008: 120). In this regard, these results reported for Barcelona 
Spanish are similar to those made for Castilian Spanish in Prieto (2006) and for 
Lima Spanish in Rao (2007) based on similar methodology? (read speech, 3 speak- 
ers from Madrid and one from Burgos in Prieto 2006, 3 Lima Spanish speakers in 
Rao 2007). In both of those, phrasing patterns are explored via OT constraints, and 


62 Prieto (2006: 42) uses slightly different cues for the boundaries of phonological phrases than 
Rao (2007, 2008): her criteria are *the perception of a prominent stress (together with a level 2 
phrase break in the ToBI framework)" as well as optionally a boundary rise. In general, the status 
of the phonological phrase / intermediate phrase in Spanish is not settled, and no phrasing studies 
based on spontaneous speech have been undertaken (Hualde & Prieto 2015: 359-360). 
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their results also suggest that the purely prosodic ones often outrank those aiming 
for a close mapping between prosody and syntax. It has also to be noted that only in 
the study on Lima Spanish do any of the experimental conditions reach something 
approaching categorical preference for a given phrasing pattern, and then always 
the one most clearly following prosodic constraints instead of ones promoting a 
close mapping (Rao 2007: 96-97, 100—102). Overall, this is less strongly the case 
for the Castilian Spanish study, where considerable variation in pattern preference 
between the speakers is reported (Prieto 2006: 46). 

In sum, the question of recursive prosody in Spanish (and Quechua) is still 
open to be answered via further research. These studies however suggest that the 
assumption of a tight mapping between syntax and prosody might not be the best 
point of departure to go looking for prosodic recursion, since they show prosodic 
phrasing in Spanish quite clearly to be subject to a number of constraints caring 
little about such a mapping.“ It must also be noted that all the studies discussed 
in this section (not only on Spanish) are based on read speech by less than ten 
speakers each (except Rao 2008), and yet they also report on some considerable 
variability in their results. No comparable studies addressing prosodic recursion 
have been done with more naturalistic data to my knowledge. In section 5.2, I will 
argue for prosodic recursion at the level of the IP based on data from Huari Spanish 
uttterances that are both semi-spontaneous and quite long and complex, also at the 
level of information structure. 


3.7 Information structure and prosody 


In the preceding sections, we discussed relevant aspects of prosodic typology 
mostly without referring to the dimension of meaning. In this section, I will first 
give a brief overview over the types of meaning intonation can convey and then 
concentrate on information structure. I will give an introduction to the model of 
context and information structure that I follow in this work (section 3.7.1), lay out 
how it can be applied to the study of understudied languages without having to 
make prior assumptions about a relation between information structural catego- 
ries and formal means of expression (3.7.2), and then conclude with (what is known 
about) the relation between information structure and prosodic cues in Spanish 
and Quechua (section 3.7.3). 


63 The equal status of prosodic and syntactic constraints for prosodic structure is also emphasized 
in Féry (2015). 
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Prosody and intonation aid in the signaling of a variety of propositional and 
non-propositional aspects of meaning. Prosodic phrasing helps to disambiguate 
semantic and syntactic structures as well as word recognition both in spoken 
language (cf. e.g. Nibert 1999, 2000; Serrano 2010 on Spanish, cf. also Brown et al. 
2015) and probably even in silent reading (the “implicit prosody hypothesis”, cf. 
Fodor 2002, the works contained in Frazier & Gibson 2015). Intonation, especially 
contour choice, has been shown not only to play a decisive role in signaling utter- 
ance type in many languages, but also to convey a number of propositional atti- 
tudes and other non-at-issue content in several languages (among others, cf. Ward 
& Hirschberg 1985; Hirschberg & Ward 1992 on pragmatic meanings classified as 
epistemic uncertainty and incredulity in English; Vanrell et al. 2013; Vanrell et al. 
2017 on uncertainty and evidential meanings in Catalan polar questions; Grice & 
Savino 1997; Grice & Savino 2004; Savino 2012 on uncertainty and the difference 
between information-seeking and confirmation-seeking polar questions in Bari 
Italian; Venditti et al. 1998 on meanings including incredulity and insistence in Jap- 
anese; Wollermann et al. 2010; Wollermann et al. 2014 on uncertainty in German 
declaratives; Escandell-Vidal 2017 on evidential meanings in polar questions and 
Fliessbach 2023 on mirativity and obviousness in Spanish). Contour choice can also 
have an effect on the salience of conversational implicatures (Kurumada et al. 2014, 
Prieto 2015, Marneffe & Tonhauser 2019). Not least, prosodic cues are essential for 
the smooth working of the turn-taking and repair systems, two strong candidates 
for perhaps truly universal systematic features of human language (Stivers et al. 
2009; Bógels & Torreira 2015; Dingemanse et al. 2015; Levinson 2016). 


3.7.1 Context and the question under discussion 


Here we will only treat one particular aspect of how prosody and intonation help 
to relate individual utterances to their linguistic and non-linguistic context, the 
signaling of information structure. This is because it will be the most relevant 
aspect in the analysis of the Huari Spanish and Quechua data. Other aspects of 
intonational meaning will also feature occasionally, but not be treated systemat- 
ically. The present work is mainly on prosody and not on formal pragmatics, but 
because these aspects of linguistic context are crucial factors for understanding 
prosody in spontaneous, ongoing discourses, some background to them will be 
provided here. Information structure relates to how the information contained 
in utterances is packaged with regards to the momentary knowledge states of 
participants, the discourse progression up to the point of utterance, and the 
intentions of participants for the discourse progression in the future (Chafe 1976; 
Krifka 2007). I assume a context model similar to the ones of Farkas & Bruce 
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(2010) and Roberts (2012b). One of the main components of these models is the 
common ground (CG, Stalnaker 1974, 2002). The common ground can be defined 
as the proposition specifying the set of propositions on which the participants 
have reached an agreement with regards to its truth value. More accurately, it 
is the proposition which speaker A believes that speaker B believes that speaker 
A believes that speaker B believes . . . specifies the set of propositions which the 
speakers have agreed upon. This formulation preserves the fact that the common 
ground is always a doxastic projection by each individual participant about what 
they believe the common ground to be. Crucially, it is also necessary to include 
a performative aspect here in that the common ground is what speakers act like 
it were the common ground: this can be out of pretense or forgetfulness or for 
another reason (Stalnaker 2002: 704—705). However, it is also assumed that speak- 
ers are basically rational in the pursuit of their intentions and communicating 
cooperatively in the sense of Grice (1989); this assumption effectively allows 
speakers to ascribe reasons (intentions) to their interlocutor’s behaviour and to 
keep calculating it rationally even when it deviates from what they themselves 
would expect their interlocutor to behave like given the state of CG, instead of 
having to assume that they behave randomly. 

What is not in CG can be seen as the set of propositions for which no truth 
value has yet been agreed upon. That is to say, these propositions define the set of 
all possible worlds that are compatible with CG (the context set in Roberts 2012b: 
4). The assertion by one speaker of a proposition, publicly committing to a truth 
value for it, and the acceptance of this assertion by the interlocutor(s), moves the 
proposition to CG, reducing the context set. This movement of propositions from 
context set to CG is the basic drive of the discourse progression.® In order to model 
its directedness and coherence in a hierarchical structure, Roberts (2012b) employs 


64 Recasting this point somewhat, one could say that the social act of committing to an action is 
more relevant here than whatever mental states are responsible for it: acting like a certain proposi- 
tion is part or not part of CG is a commitment by a speaker to another to be consistent with respect 
to this position, and breaking such commitments has the social consequence of not being treated as 
a reliable communication partner anymore (cf. Geurts 2019). 

65 This is necessarily an extremely simplified picture. Not only are there speech acts other than 
assertions and questions, such as directives, that have the aim of getting another speaker to change 
the way things in the world are instead of increasing shared knowledge about how they are, but 
there are also assertions with additional presupposed or non-at-issue modal meanings such as 
surprise and obviousness that force interlocutors to revise some of the assumptions already shared 
by them and contained in CG (surprise / mirativity) or that make statements about the fact that the 
content of an assertion is already contained in or entailed by CG and therefore not newsworthy 
(obviousness). See Reich (2018); Fliessbach (2023) for a discussion of these meanings. 
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the device of the question under discussion (QUD, also used in Farkas & Bruce 2010; 
Ginzburg 2012, and several others). In her QUD model, discourse progresses via 
the posing and answering of (mostly implicit) questions. By accepting a QUD as the 
current one, discourse participants agree to keep making attempts at answering 
it until they either agree that it has been fully answered or that they must leave it 
unanswerable for the moment (Roberts 2012b: 6-7). Questions are only felicitous 
when their answers are not entailed by CG® and when they can be construed to be 
relevant, that is when answers to them contribute to providing (partial) answers 
to the current QUD in the discourse? (Simons et al. 2010: 316-317; Roberts 2012b: 
21), with a partial answer defined as providing an evaluation of at least one of the 
propositions comprising the alternative set of the question, i.e. all possible answers 
to it (Roberts 2012b: 11-12, based on the semantics of questions first proposed in 
Hamblin 1958, 1973). Assertions are also only felicitous when they are not entailed 
by CG and when they are relevant, providing a (partial) answer to the current QUD. 
The entailment and relevance conditions define permissible question strategies, i.e. 
those that aim to completely answer sub(-ordinate) questions in order to partially 
answer super(-ordinate) ones; once a sub-QUD has been answered, the super-QUD 
it is entailed by becomes the current QUD again, giving discourse overall a hierar- 
chical QUD structure. The question alternative set is pragmatically restricted by the 
discourse context, as in (25), where the context reduces the question alternative set 
to three members from the potentially infinite number of propositions about Alice 
winning some prize. 


(25) Context: Alice, Claire, and David each won a prize at a sheep shearing 
competition. They bring home a pair of golden scissors, a silver tuft of sheep's 
wool, and a bar of lanolin soap. 

Question: What did Alice win? 
Pragmatically restricted question alternative set: (Alice won the golden 
Scissors, Alice won the silver tuft of wool, Alice won the lanolin soap) 


66 This is again a simplification. In practice, questions whose answers are entailed by CG do occur 
and are likely to provoke an assertion that is marked for obviousness (cf. Fliessbach 2023). Effec- 
tively they and their responses represent highly marked discourse moves. 

67 Based on a discussion of utterances containing epistemic modals and evidentials in discourse, 
Roberts (2015: 52—53, 2017: 30-31) introduces a revised notion of relevance which allows discourse 
moves (questions and assertions) to be indirectly relevant, i.e. felicitous, when they provide an 
assessment in terms of the likelihood that a relevant proposition is true or false (see also Simons 
et al. 2010: 316—317, note 3). 
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Propositions forming the question alternative set of a wh-question all share a 
similar structure in that they only differ with regards to which element replaces 
the wh-element in the question, with the appropriate type of entities specified by 
the choice of the wh-element (cf. Roberts 2012b: 10). Polar questions have a ques- 
tion alternative set comprising only two members that only differ with respect to 
their absolute polarity (cf. especially Farkas & Bruce 2010 for a treatment of polar 
questions in discourse). Alternative questions have effectively the same kind of 
question alternative set as wh-questions, with a specified constituent in the ques- 
tion to be replaced by one in the answer, but the answers treated as likely by the 
speaker are usually fully listed in the question, and the restriction on appropriate 
types of entities is provided only by context, not the choice of wh-element (e.g. Do 
you prefer Canada, going home, or Chadwick Boseman?; cf. on alternative questions 
Sadock & Zwicky 1985: 178-179; Riester & Shiohara 2018: 292). A felicitous answer 
to a question must be the assertion of a proposition that is a member of the ques- 
tion alternative set; this question-answer coherence is pragmatically ensured via 
the Gricean maxim of relevance/relation, according to Roberts (2012b: 21) (cf. also 
Grice 1989; Krifka 2007: 22-23). 

While this conception is probably not fully capable of providing a realistic 
model of discourse (cf. Riester et al. 2018: 405 for this view as well as e.g. Roberts 
2012a: 8-14 for a number of outstanding issues), it couples two major mechanisms: 
discourse coherence (via question-answer coherence and the hierarchical QUD 
structure) and the internal division of utterances via their information structural 
status. The latter has two aspects: the division between what is and what isn't at-is- 
sue and the one between focus and background. The former of these describes a 
meaning distinction at the level of propositions: in an utterance, only those propo- 
sitions are at-issue (AT) that are relevant with regards to the current QUD (Simons 
et al. 2010: 317; Beaver et al. 2017: 280; Roberts 2017: 9), those that aren't are non 
at-issue (NAD. The AI/NAI-division is a broad label encompassing aspects of a range 
of phenomena such as pragmatic presuppositions, appositives, non-restrictive rel- 
ative clauses, evidential meanings, conventional implicatures, and the projection 
behaviour of factives (cf. e.g. Tonhauser et al. 2013; Faller 2014; Bianchi et al. 2016; 
Tonhauser et al. 2018; cf. Potts 2015 for an overview). One diagnostic for at-issue- 
ness is whether content can be directly targeted by polarity particles in responses, 
according to Roberts (2015: 44-46, 2017: 9). According to this diagnostic, some of the 
meanings conveyed by intonation are also non-at-issue (cf. for this view also Potts 
2005: 26, 36-37; Prieto 2015): 
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(26) Context: Alice, Bob, and Doreen are living together. They are going to have 
friends over for dinner. 
Alice: Who’s coming for dinner? 
a) Bob: Jonathan (L*H L-H%) 
Doreen: a. No [he’s not coming] 
b. #No [you're not uncertain] 
c. Why so unsure about it? You told me he was coming five 
minutes ago! 
b) Bob:Im not sure about Jonathan 
Doreen: Yes [you are — you told me he was coming five minutes ago] 


In (26)a), Bob answers Alice's question with the uncertainty contour on Jonathan. 
Doreen can deny the truth of the at-issue proposition that John is coming (answer- 
ing the current explicit QUD, (26)a)a), but she cannot deny the non-at-issue content 
that Bob is uncertain conveyed by the intonational contour directly by using only 
the response particle no ((26)a)b). In order to do that, she has to perform a marked® 
move requiring much more explicit elaboration, like (26)a)c. That it is not the fact 
that she cannot target Bob's uncertainty for reason of it constituting a subjective 
mental state accessible only to him is shown in (26)b), where the uncertainty 
meaning is lexically conveyed in the main clause and at-issue (in the sense that 
Bob's public commitment about being unsure has a bearing on the probability that 
the answer to the relevant question Is Jonathan coming? can be taken to be true, cf. 
Roberts 2015, 2017). For Roberts (2017: 7), the focus-background division is part of 
the mechanism that can separate AI from NAI content, as in (27): 


(27) (from Roberts 2017: 7) 
Context: A, B, and C are discussing the race that B and C just watched, in 
which their mutual friend Mary was supposed to be a participant. 
A: How did Mary run? 
B: She ran [QUICKLY],—one of her best times ever! 


68 Farkas & Bruce 2010 provide reasoning about what kind of conversational moves are more or 
less marked depending on context. They model the relatively privileged status of some responses 
to certain provocations using the device of the projected set, which contains the immediate future 
state(s) a context will assume upon an unmarked response to a given provocation. For example, 
they argue that the unmarked response to an assertion is to accept it (and thus for the asserted 
proposition to enter CG), and that to a neutral polar question is to commit to either of the proposi- 
tional alternatives provided by the polarity, but that biased polar question types differ in precisely 
this property, i.e. that they only contain one of the polar alternative propositions in the projected 
set (Farkas & Bruce 2010: 93, 96-97). 
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C: a. No she didn't! 
b. Hey, wait a minute! Mary didn't run at all! / But Mary didn't run at all! 


In (27), the explicit QUD posed by A already presupposes that Mary ran, it asks only 
for how she ran, which is answered in B's response by the focused adverb quickly. It 
is only this assertion about Mary's quickness which C can target in their contradic- 
tion by saying no she didn't; if they wanted to challenge the backgrounded content 
that she ran, they would again have to resort to a more marked response like the 
ones in (27)b. In this conception, the focus-background division is a special case of 
the AI-NAI division as applied to a particular type of content, the proffered content: 
“the compositionally calculated truth conditional content of the expression; what it 
contributes to what is asserted, asked or directed by an utterance in which it occurs" 
(Roberts 2017: 6). Only (parts of) the proffered content can then be at-issue at all. 
Other types of content, the presupposed and the auxiliary content, which include 
conventionalised implicatures and the type of meaning conveyed by intonation in 
(26), for example, are always non-at-issue. Under this view, focus and background 
can essentially be subsumed via cross-classifying types of contents and at-issue- 
ness: focus would be [*at-issue, *proffered] and background [-at-issue, *proffered]. 
Others, although using the same terminology, draw the distinction elsewhere: 
Riester et al. (2018); Riester & Shiohara (2018); Riester (2019) develop a model for 
information structural annotation of actual conversations based on QUD structure, 
but they reserve the label of *non-at-issue" for those types of content Roberts would 
class as presupposed and auxiliary, and explicitly exclude backgrounded but prof- 
fered material from it (cf. Riester et al. 2018: 428). Although Roberts' proposalis the- 
oretically neat because it achieves a comprehensive unification, I will here adopt 
the convention followed by Riester and colleagues, simply because their goals align 
more immediately with my own in this work: to operationalize the QUD model for 
the categorization of segments of actual spoken conversation according to infor- 
mation structure, independently of linguistic form, so that prosodic exponents can 
then be related to them without entering a circular argumentation (cf. Riester et al. 
2018: 405—406). In the following, I will use their definitions to discuss the informa- 
tion structural notions of focus, background, and topic. 


3.7.2 QUD-based annotation of information structure independent of form 


Focus is very heterogeneously defined in the linguistic literature (cf. Krifka & Musan 
2012: 6-7, 17-18; Riester & Shiohara 2018: 290—291 for overviews). The definition 
adopted by Riester et al. (2018: 417); Riester & Shiohara (2018: 291—292); Riester (2019: 
166) is to say that the focus in an utterance is exactly that part which answers the 
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current QUD, while its counterpart, the background, is material that is proffered but 
already given in the question itself. I will also adopt this definition of focus here. It has 
the crucial advantage of being entirely based on context (as described in QUD terms). 


(28) QUD-focus relation in wh-questions 

a. Q:What happened? 

A: [Sharon sheared her sheep on the first Saturday in September]: 
b. Q:What did Sharon do? 

A: Sharon [sheared her sheep on the first Saturday in September]; 
c. Q:When did Sharon shear her sheep? 

A: Sharon sheared her sheep [on the first Saturday in September]; 
d. Q:Who sheared their sheep on the first Saturday in September? 

A: [Sharon]; sheared her sheep on the first Saturday in September 
e. Q: When in September did Sharon shear her sheep? 

A: Sharon sheared her sheep [on the first Saturday]; in September 


(29) QUD-focus relation in polar and alternative questions 
a. Q:Did Sharon shear her sheep on the first Saturday in September? 
A: [Yes]; / [Yes, she did]; / [Yes, Sharon sheared her sheep on the first 
Saturday in September]; 
b. Q:Did Sharon shear her sheep or her goats on the first Saturday in 
September? 
c. A:Sharon sheared [her sheep]; on the first Saturday in September 


In (28), the same utterance, Sharon sheared her sheep on the first Saturday in Sep- 
tember has different focal domains depending on the preceding question context. 
The focused element(s) correspond exactly to the open variable in the wh-question 
and are marked by square brackets with a subscripted F. All material that is already 
given in the question is not part ofthe focus in the answer and thus backgrounded. 
As mentioned above, in the conception by Roberts (2017), it would also be non-at-is- 
sue. In actual conversation, the full answer utterances would most likely not be pro- 
duced. Instead, most or all of the material in the background would probably be left 
out, but note that the focal material could not be further reduced given the QUD. 
This is true also in (29)a, where if the two first variants aren't chosen, then the full 
utterance has to be produced to be felicitous. The bracketed notation for the focal 
material is intended merely as a notational aid: the property of being in focus is 
bestowed entirely by the context in this conception, and no specific focus marking 
is necessary, although various expressive means, a number of them prosodic, are 
employed to various degrees in different languages to aid in the interpretation of 
which material is focal and which is backgrounded (see below). 
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In that, this focus definition differs from another very influential one, origi- 
nally proposed in Rooth (1985, 1992) and defended in Krifka (2007); Krifka & Musan 
(2012), namely that focus indicates the presence of alternatives relevant for the 
interpretation. Because of the way the context is modeled, with felicitous answers 
effectively coming from the question alternative set of the QUD (Roberts 2012b), 
the two definitions cover a lot of common ground. However, while in the definition 
adopted here, focus is essentially just a label for that part of an utterance which 
directly corresponds to the QUD - and this means that alternatives will be rele- 
vant for its interpretation by definition -, in the other it is something (a feature) 
which actively acts to indicate that alternatives should be taken into consideration 
for interpretation. This is made explicit in the revised definition given in Krifka 
(2007: 19): “A property F of an expression a is a Focus property iff F signals (a) that 
alternatives of (parts of) the expression a or (b) alternatives of the denotation of 
(parts of) a are relevant for the interpretation of a." Here, focus is a property of an 
expression which signals a meaning, just like e.g. a morphological tempus marker. 
After some discussion that alternatives are also relevant in the interpretation of 
other expressions lacking focus, Krifka & Musan (2012: 7) refine the definition 
given there by saying that it *means to say that focus especially stresses and points 
out the existence of particular alternatives", which again implies the presence of 
some kind of focus signifier. From this perspective, it also makes sense to state, as 
they do, that it seems that some African languages do not make *use of focus to 
mark question-answer coherence" (Krifka & Musan 2012: 10, note 4) given findings 
that these languages do not employ any expressional means to mark the difference 
between focus and background in question-answer pairs. From the perspective of 
the definition adopted here, the argument is subtly different: the answer to the 
QUD, as long as it can be clearly established, is the focus by definition — but it is 
clearly an open question typologically whether and how individual languages will 
employ any means that aid to reconstruct such a context in interpretation. 

It has repeatedly been observed that languages differ with respect to the 
expressional dimension of focus, and plausible hypotheses have been made about 
what makes focus marking more likely, e.g. based on an asserted proposition's 
status with respect to expectations built up in the discourse (Zimmermann 2008), 
or on a notion of scale of strength applied to different proposed types of foci (Féry 
2013: 689-690). However, if focus is fundamentally conceived of as some kind of 
property of expressions, then making any kind of comparison between what can 
count as a focus marker becomes a circular enterprise because context cannot 
serve as a true third of comparison, but only as an insufficient indicator. This circu- 
larity and its gloomy consequences for the investigation of information structure 
across languages is the explicit reason why Riester et al. (2018: 406); Riester & Shi- 
ohara (2018: 290) define focus and all other information structural notions strictly 
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based on context instead of on any linguistic form. Doing so means to assume that 
the underlying mechanism of information transmission and the essential pragmat- 
ics concerned with its packaging are effectively universal or at least sufficiently 
similar across languages, to the extent that a model can accurately describe them 
(Riester et al. 2018: 405-406). Effectively, the assumption is that this mechanism 
is shaped directly by universal aspects of human communication, such as the 
Gricean cooperative principle and the maxims resulting from it.$? In this regard, 
the position adopted here from Riester and colleagues, locating the universality in 
the pragmatics, is different from both the Rooth/Krifka position, which locates it in 
some part of the expressional machinery, presumably in the syntax, and that taken 
by Matié & Wedgwood (2013), who deny any universality to focus as an informa- 
tion structural category. Whichever of these positions might eventually turn out 
to be true, the approach by Riester and colleagues has the clear methodological 
advantage that focus and other IS-notions can be identified across languages and 
from context alone, and thus allows to correlate definable context conditions with 
whatever means of expression a language might or might not employ. In order to 
do that in the analysis of naturally occurring conversation, criteria for the identifi- 
cation of implicit QUDs must be set down. This is highly important because in many 
types of conversations, explicit questions are quite rare, and instead, sequences 
of assertions occur, that nonetheless are taken to follow a hierarchical QUD struc- 
ture (Riester & Shiohara 2018: 292). For them, the implicit QUDs have to be recon- 
structed, as it were, in order to be able to make statements about the information 
structural division of the assertion at hand. 

The first and broadest of these criteria is question-answer congruence, i.e. that 
the observed assertion must be a felicitous answer to the implicit QUD (Riester 
et al. 2018: 411—412). Question-answer congruence allows any of (30)a-c (and some 
others) to be identified as implicit QUDs for the assertion in (30), but prohibits ques- 
tions such as (30)d-e because they are not answered by the assertion. 


69 Presumably, this can also be brought into relation with Chafe (1994: 109-110)‘s „one new idea“, 
and also with how information transmission is described by Shannon (1948), namely that it elim- 
inates possible states a system could assume. If no indication in an utterance were available from 
context-based expectation which the relevant location for the elimination of such possibilites is, all 
transmitted material would be equally informative and no inference about the intended direction 
of the discourse were possible, which would arguably divest humans of a large part of their inten- 
tion-reading skills, effectively grounding communication to a halt. 
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(30) Question-answer congruence 

Assertion: 

Alice went down the rabbithole 

Question: 

What happened? 

Where did Alice go? 

Who went down the rabbithole? 

# What is a rabbithole? 

# When did Alice go down the rabbithole? 


co Bo > p 


Possible implicit QUDs are further constrained by two principles that capture a 
crucial difference between implicit QUDs and explicit ones: implicit QUDs, not 
being actually performed discourse moves, cannot possibly introduce any new 
material, they can consist only of material that is already given” in the context (and 
combine it with operators, such as a wh-element). This principle is called Q-Given- 
ness in Riester (2019: 174). The second principle is called Maximize-Q-Anaphoricity 
(Riester et al. 2018: 412; Riester 2019: 175) and it states that implicit QUDs should 
not only consist only of given material, but that they “should contain as much given 
(or salient) material as possible”. “As possible” is intended to be constrained by the 
preceding context and the observed assertion: this means that all material that can 
be determined as given in the answer (from the preceding discourse context) will 
be contained in the reconstructed implicit QUD. Effectively, Maximize-Q-Anaphoric- 
ity thus works to narrow the focus as much as possible both in the implicit QUD and 
in turn, in the assertion (cf. Riester 2019: 175). We can see how the three princi- 
ples act together in identifying implicit questions in the analysis of a short example 
excerpt from the Quechua corpus: 


70 With a definition of givenness adopted from Baumann & Riester (2012), where it is broadly 
defined as an expression or a referent having been made available in the preceding discourse con- 
text, with a separation between referential and lexical givenness and a temporal decay of cognitive 
activation included in the model. See also Riester (2019: 174, note 8); cf. section 6.2.3.3 for a more 
detailed discussion. 

71 Thus, while implicit QUDs are considerably constrained by the assertions they are reconstruct- 
ed from, explicit questions, with only general constraints of coherence placed on them, can truly 
change the direction of discourse (cf. Riester 2019: 174-175). 
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(31) TP03 KP04 MT Q 2640-2704” 
time KP04 (with the path on the map)  TP03 (without the path on the map) 
2640 alli-m 
good-ASS 
alright 
265.0 naa tsawra tillaku-pita-qa 
PSSP then lightning-ABL-TOP 
well then from the lightning 
266.8 ya 
yes 
yes 
267.5 pasa-rku-y manka-yaq 
pass-DIR-INF pot-TERM 
go up to the pot 
268.7 manka-pa hana-n-pa-m pasa-n 
pot-GEN above-3-GEN-ASS pass-3 
it goes above the pot 


(31) is a short excerpt from a maptask experiment forming part of the corpora 
analyzed in this work. As Riester et al. (2018: 408) emphasize, it is very important 
that a corpus analyzed in this way should be well understood by the analyst. This 
also includes knowledge about the type of conversation (in this case, the kind of 
experimental communication game), its overall context, and the speakers involved. 
General information on these matters is provided in sections 2.3 and 2.4. Here, it is 
relevant to know that the speakers are in the process of tracing the path for a second 
time already, after the first attempt led to confusion (because two landmarks are 
intentionally different on the two speakers’ maps, cf. Figure 188 and Figure 189 in 
Appendix B). Their discourse proceeds from landmark to landmark, having begun 
with one where they were sure to be still in agreement. At 264.0, KP04 signals that 
the discussion about the preceding landmark is complete and that they can move 
on to the next issue; in terms of Farkas & Bruce (2010), the Table of current issues is 
empty (but in terms of QUD structure following Roberts 2012b, very superordinate 
QUDs such as WHAT IS THE OVERALL TRAJECTORY OF THE PATH? are still active). We 
will proceed backwards through the example, for reasons that become clear soon. 
At 268.7, the only new element in the assertion is hana-n-pa-m “above”; all the other 
elements are given already in the preceding turn. Thus by the principle of Q-Given- 


72 https://osf.io/d7tzc 
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ness, the meanings encoded by manka-pa “of the pot” and pasa-n “s/he/it passes” 
are included in the implicit QUD, a good candidate for which would be something 
like WHERE IN RELATION TO THE POT DOES THE PATH GO?, so that only hana-n-pa-m is 
in focus in the assertion. Note that also Maximize-Q-Anaphoricity prevents WHERE 
DOES THE PATH GO NOW?, with a correspondingly broader focus [manka-pa hana-n- 
pa-m]; Moving one turn backwards to 267.5, although this is a directive, we can 
apply basically the same kind of reasoning: manka-yaq *to the pot" is not given in 
the context, so the implicit QUD must at least ask for it and it is in focus. Regarding 
pasa-rku-y *move in a direction", strictly speaking it is not given in the preceding 
context and thus the most apt implicit QUD could be something like WHAT NEXT?, 
effecting the broadest possible focus. On the other hand, it could be argued that 
the action of passing from one place to the other in this conversational game is so 
commonplace that it is always salient and thus given. Then, by Q-Givenness and 
Maximize-Q-Anaphoricity, the implicit QUD would be something like WHERE TO GO?, 
with a narrower focus in the assertion just on manka-yaq. Deciding this issue can 
be facilitated by considering the entire corpus, but in the end it is probably more 
important that the analytical decision of how to treat such verbs and the actions 
denoted by them should be consistent. Moving backwards by yet another step 
to the peninitial turn (in the excerpt) by the same speaker, a less simple decision 
awaits us. At a first glance, we could reconstruct an implicit QUD like FROM WHERE 
THEN? for naa tsawra tillakupitaqa *well then from the lightning" at 265.0, with 
Maximize-Q-Anaphoricity working to include tsawra *then", and perhaps even the 
meaning encoded by the ablative suffix -pita, into it. However, this turn is not an 
assertion by itself. The only argument for that could be that it is directly followed 
by backchanneling by the other speaker, but if the turns by KP04 at 265.0 and 267.5 
were produced in a single utterance, it would be clear that tillakupitaqa does not 
constitute an independent speech act. Instead, it seems to fulfill a topical function 
here:?? the directive at 267.5 is referentially disambiguated only by its presence. I 
want to argue further that it is a contrastive topic, albeit a type called implicit con- 
trastive topics by Riester & Shiohara (2018: 300—301). 


73 The suffix -qa or its variants is usually identified as a ,topic marker* in Quechua grammars (cf. 
amongst others Parker 1976; Weber 1989; Cusihuamán 2001; Adelaar & Muysken 2004), and the 
gloss used here also indicates this. However, topic definitions are almost as numerous as focus ones 
(see Roberts 2011), and Weber (1989) describes uses of it for Huallaga Quechua that do not square 
well with all of them. In addition, in keeping with the approach laid out by Riester and colleagues, 
I try to refrain as much as possible from building arguments for information structural analysis 
from evidence based on formal linguistic features, instead of context-content relations. Section 
6.4 will be particularly concerned with disentangling the cues for IS from different domains from 
context-based interpretation in Huari Quechua. 
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Contrastive topics, according to Büring (2003); Roberts (2012b), are the result 
of a complex strategy of inquiry that involves parallel subquestions. That is to say, 
when a complex QUD containing two wh-variables, such as WHO SHEARED WHICH 
SHEEP? is to be answered, then this is often done by asking the subquestions that 
result from filling one ofthe variables with the individual members of its answer set, 
such as SEAN SHEARED WHICH SHEEP?, SIAN SHEARED WHICH SHEEP?, SHANE SHEARED 
WHICH SHEEP?. This implies providing (partial) answers to the first variable. Thus 
the superquestion WHO SHEARED WHICH SHEEP? is answered by first answering WHO 
SHEARED SHEEP?, and then substituting the answers into the superquestion, yielding 
subquestions that are parallel. This is a viable strategy because each subquestion is 
entailed by the superquestion, each answer to a subquestion is a partial answer to 
the superquestion and thus relevant. In the answers to the subquestions, yielding 
parallel structures such as Sean sheared Bessy, Berta, and Balu; Sian sheared Billy, 
Bartholomew, and Boris; Shane sheared Bob, Bonita, and Brenda, the sets of sheep 
{Bessy, Berta, Balu}, (Barbara, Bartholomew, Boris), and (Bob, Bonita, Brenda} are 
at-issue with regards to the current QUD, and the shearers Sean, Sian, and Shane, 
as contrastive topics, are at-issue with regards to the preceding QUD. According 
to Riester & Shiohara (2018: 295); Riester et al. (2018: 422-423), the parallel struc- 
ture of such assertions indicates the complex relationship they are in, namely that 
they aren't just relevant for the current QUD, but answer such a superquestion in 
concert.”* However, it is precisely this overt parallelism which at first sight seems 
to be missing in our example (31). Riester & Shiohara (2018: 300) remark that in 
their Sumbawa corpus, such overtly parallel structures do not occur, and it seems 
likely that they are in general quite rare in naturally occurring discourse (they 
do come up on occasion in our own corpora). They nevertheless argue that turns 
similar to our example should be analyzed as containing implicit contrastive topics 
because, while only realizing one of the parallel answers overtly, their interpreta- 
tion in context implies the existence of such relevant (with regards to a super-QUD 
in the discourse) parallel predications made about other entities that would also be 
contrastive topics. They cite considerations that continuing topics, which are not 
contrasted, usually do not need to be overtly expressed at all (of course depending 
on the possibility for the language in question to not realize given arguments), and 
that when such elements are then overtly realized, they often indicate a topical 
change, i.e. a contrast, and such an implicit parallel structure if this is corrobo- 
rated by the broader context (Riester & Shiohara 2018: 301). In our example, the 


74 Cf. also the definition for so-called delimitators, comprising contrastive topics and frame-setters 
together, in Krifka & Musan (2012: 32-34), which also emphasizes that such structures cannot be 
interpreted purely locally. 
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case can be argued quite well that tillakupitaqa is an implicit contrastive topic. In 
the preceding discourse, speaker KP04 (the instructor in the maptask) had already 
tried to move the conversation to where the path leads from the lightning (as part 
of the repeated instruction going stepwise from landmark to landmark), this was 
then followed by the other speaker (TP03) backtracking because he could not follow 
the instructions KP04 was giving subsequent to moving from the lightning on his 
map. This is the issue that KP04 signals as having been settled by uttering allim at 
264.0, and with the turn at 265.0, he thus indicates that the following path instruc- 
tions are to be understood with reference to the lightning (again), as opposed to 
the subsequent landmark that left TP03 stranded. Because of this, and the step- 
wise progression from landmark to landmark, it seems reasonable to assume that 
a superquestion something like FROM WHICH LANDMARK, WHERE DOES THE PATH GO? 
plays an important role in the structure of their discourse, and that the assertion at 
265.0 and 267.5 is to be understood as a partial answer to that, with tillakupitaqa an 
implicit contrastive topic in the sense laid out above. In summary, we can see that 
the approach pioneered by Riester and colleagues can be used to make quite a bit of 
headway? into an information-structural analysis of spontaneous speech corpora 
without making reference to formal means of encoding them, and it will be used 
in this way in parts of this work to separate these meaning-based categories from 
potential prosodic cues. 


3.7.3 Prosodic cues and information structure 


A strictly context-based definition of information structural categories still leaves 
the question open whether formal means of expression signal information struc- 
ture directly. However, a view often adopted in the literature on prosody and into- 
nation is that this signaling is only indirect (Ladd 2008; Calhoun 2010b; Kügler & 
Calhoun 2020). According to this view, a constituent e.g. being in focus is not auto- 
matically linked to some kind of prosodic exponent; instead, information structural 
division of utterances is signaled indirectly via metrical and prosodic structure, 
which is what is cued by phonetic exponents in turn. In the following two sections, 
I will review evidence for this hypothesis and refine the relation between informa- 
tion structure and prosodic cues first for Spanish, and then for what is known in 
this regard about (Cuzco) Quechua. 


75 Although the analysis especially of implicit contrastive topics is not yet fully satisfying, as Ri- 
ester & Shiohara (2018: 301) admit. My own analysis will often be based on reasoning drawing on a 
consideration of larger parts of the nonlocal discourse context, as exemplified above. 
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3.7.3.1 Spanish 

In fact, much of the literature on prosodic focus marking in Spanish can be read as 
evidence for precisely this indirect relation. Cases in point are Gabriel (2007) and 
Vanrell & Fernández Soriano (2018). Both investigate prosodic and syntactic strate- 
gies of signaling information structure in assertions via a question-based elicitation 
paradigm involving speakers from several varieties of Spanish.” They find that the 
prosodic cues involved in signaling the information structural division between 
background and focus in declaratives include a high iP-boundary tone (H-) occur- 
ring at the right edge of the prefocal background material," an LH* pitch accent on 
the final prosodic word of the focal material, a low iP-boundary tone (L-) at its right 
edge, and pitch accent compression or deaccentuation on the postfocal background 
material. While all of these cues were attested in productions from speakers of all 
varieties in both studies, they were also all optional to some extent. H- at the right 
edge of the prefocal background occurred in 81.396 of cases (113 of 139 possible 
utterances), with individual occurrence rates ranging from about 55% (3 speakers) 
to 10096 (4 speakers) and the other speakers distributed evenly among interme- 
diate rates in Gabriel (2007: 276—277). Across speakers, occurrence was more fre- 
quent when the focus domain was narrower (i.e. the focal material consisting of 
fewer elements in correspondence to the QUD) than when it was broader. Postfocal 
deaccentuation/pitch accent compression (cf. section 3.4.6) has slightly lower occur- 
rence rates, ranging from 40% to 8396 across the relevant elicitation contexts and 
also showing strong individual differences between speakers. Again, the width of 
the focus domain seemed to play a role, with a group of utterances where the focal 
material consisted of more than a single prosodic word never occurring with deac- 
centuation on the available postfocal backgrounded material (Gabriel 2007: 282). 
Vanrell & Fernández Soriano (2018) report on similar tendencies with regards to 
these two cues, but do not offer precise numbers. With regards to the realization of 


76 In the case of Gabriel (2007): 18 speakers, of which 14 from various Spanish regions, all judged 
to be speakers of "standard Castilian Spanish", and one each from El Salvador, Colombia, Mexico, 
Argentina, of which only the last two are judged to not be speaking "standard" Spanish by Gabriel 
(2007: 269). In contrast, Vanrell & Fernández Soriano (2018: 38) present data from 9 speakers, 2 
from the Canarian Islands, 2 from the Basque country with Basque as L1, another two from the 
Basque country with Spanish as L1, three from Madrid; they take them to speak at least four differ- 
ent varieties of European Spanish. 

77 Gabriel (2007: 280—281) also finds evidence that the segmental prosodic process of sinalepha, 
the assimilation of adjacent vowels across word boundaries, is blocked between the right edge of 
the final phrase containing backgrounded material and the left edge of the phrase containing focal 
material. For a study on variant phonetic realizations of the H- boundary tone, see Gabriel et al. 
(2011). 
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the focal material itself, both studies find that one alternative to the LH* L- contour 
is L* L%, which occurs only when the focal material is final in the utterance (but 
then not always, cf. also Hualde & Prieto 2015: 364). In the data reported on by 
Vanrell & Fernández Soriano (2018: 54), it is also not restricted to cases of broad 
focus. Vanrell & Fernández Soriano (2018: 43) attest to a third option, L<H* H-, 
which they describe as a “rise throughout the accented syllable which continues to 
the end ofthe intermediate phrase”. This contour they only describe in cases where 
the focal constituent is a subject and occupies an initial position due to it-clefting 
(e.g. Fue [Juanita]; la que vio siete armadillos);’* it is not followed by postfocal deac- 
centuation or compression (Vanrell & Fernández Soriano 2018: 43-54). The latter 
fact suggests a possible connection between the presence of the L- and subsequent 
deaccentuation or compression. 

Both studies interpret their results in such a way that the right edge of the 
focal material seeks to align with the right edge of an iP, and with the most prom- 
inent position in that iP. In view of the definition of nuclearity adopted here from 
Ladd (2008) (cf. section 3.4.4), where the nuclear accent is the only obligatory and 
final one in an iP, we can simply say that focus is preferentially aligned with a 
nuclear accent. What the results of these studies (and many others) further show 
is that it is misleading to speak of prosodic cues for focus. The cues discussed 
above are not only used in the contexts described: H- and L- are simply cues for 
iP-phrasing, which is certainly not wholly determined by information structure. 
Deaccentuation and locally reduced pitch span also occur in contexts that are not 
postfocal (cf. Ortega-Llebaria & Prieto 2007; Torreira et al. 2014). Summing up, the 
division between focus and background is clearly one important factor influenc- 
ing iP-level phrasing and nuclear accent placement in Spanish, but its signaling 
does not make use of a unique phrasing strategy, or one that is reserved exclu- 
sively for this function. 


78 Note that by and large, syntactic strategies are found to be at least as variably associated with 
information-structural configurations as prosodic ones in both of the studies: preferences for dif- 
ferent types of clefts, fronting, in-situ word order (with prosodic marking as discussed in the text) 
and the postposing of the focal material (p-movement in Zubizarreta 1998) vary somewhat across 
focus type and variety, but mostly across speakers, and are thus only ever tendencies, with the gist 
being that clefts are preferentially used in contrasting or correcting contexts, and that in-situ word 
order is overall far more preferred than p-movement, which is rather marginal (81/182 for in-situ 
vs 26/182 for p-movement in *information focus" contexts, 114/300 for in-situ vs 34/300 for p-move- 
ment in *contrastive focus" contexts in Vanrell & Fernández Soriano 2018: 48,54; cf. Gabriel 2007: 
283, 289—290). However, some type of syntactic strategy resulting in a deviation from unmarked 
word order seems to be employed in roughly half of the cases. 
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While results on phrasing in Spanish indicate that focus is associated with the 
highest prominence at the iP-level, variably expressed by a range of phonetic cues, 
results on the paradigmatic choice between nuclear configurations also indicate 
that focus is not the determining factor for the choice of pitch accents and bound- 
ary tones. Nuclear configurations occur in focus position, but the choice of config- 
uration can encode additional pragmatic meaning such as illocutionary force and 
modal evaluative meaning. 

Returning to the three focal nuclear configurations just discussed, both the par- 
adigmatic choice between LH* L-, L* L%, and L<H* H-, and indeed of these three 
variants against other nuclear configurations, does not differentiate between what 
is focal and what isn’t.”? Between those three, on the one hand, the choice is largely 
determined by position, with L* L% occurring IP-finally, LH* L- IP-medially, and 
L<H* H- seemingly optionally when important material is phrased separately IP-in- 
itially? With regards to the contour conventionally transcribed as L* L%, its most 


79 Or between „informational“ focus and ,contrastive* focus (a focal choice between a finite set of 
salient candidates), as is sometimes claimed: both Gabriel (2007) and Vanrell & Fernández Soriano 
(2018) explicitly test for this difference and come to the conclusion that it does not affect prosodic 
realization in any of the dimensions discussed here (*contrastive" focus is also sometimes related 
to a notion of *emphasis" that is supposedly expressed in increased pitch scaling). 

80 Gabriel (2007: 285—287) also finds this contour in part of his data elicited through a reading 
task, where the context together with the word order of the elicitation sentence forces a reading 
in which the focal constituent is fronted. He dismisses instances of L<H* H- on such fronted con- 
stituents that denote the answer to the (explicitly given) QUD as an inappropriate reinterpreta- 
tion of them as (left-dislocated) topics by the speakers. Given that fronting as a focusing strategy 
is never attested in his more spontaneous data and that in clefts, the occurring tonal movement 
is analyzed as LH* L-H% (Gabriel 2007: 283), this characterization is certainly not unreasonable. 
However, fronting (without claims about its intonational specification) has been suggested to con- 
vey a mirative or counterexpectational import in Spanish (Leonetti & Escandell-Vidal 2009; Reich 
2018; Cruschina 2019). The lack of such a pragmatic specification in the elicitation contexts might 
go some way in explaining its scarcity in the data reported on by Gabriel (2007), and in the light of 
the findings by Vanrell & Fernández Soriano (2018), it seems plausible that L<H* H- is not a misin- 
terpretation (under the purely contextual definition of focus adopted here, that is strictly speaking 
impossible if the experimental task was not itself misunderstood), but simply an option available 
for phrasing important IP- or utterance-initial material in its own iP. That could occur both in it- 
clefts and constructions with initial topics, so that “important material” would have to remain as 
placeholder until further research provides more precise insights. Under the view adopted here, 
prosodic configuration, syntax, and discourse context together would then act together to yield an 
interpretation disambiguating the information structural role of the fronted constituent. 

An additional point could be made regarding the interpretation of L<H* H- based on its pho- 
netic realization: Gabriel et al. (2011) show that the phonetic realization of H- has a range of vari- 
ants, in most of which the pitch is affected not only in the final syllable of the phrase, but already 
in the preceding ones including the tonic and posttonic. That makes it difficult to tell whether the 
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conspicuous feature is the absence of any pitch excursion on the nuclear accented 
syllable, as noted by Hualde & Prieto (2015: 364), who also call this an “accent 
without tonal correlates”. That such a featureless contour is regularly interpreted 
to cue rightmost prominence in the IP can best be explained by the strong expecta- 
tional bias for this metrical configuration, an observation already partly encapsu- 
lated in the formulation of the “nuclear stress rule” (Chomsky & Halle 1991 [1968]: 
17-25). Prosody corresponding to this unmarked rightmost default prominence 
does not seem to need to be cued overtly most of the time, only marked deviations 
from it do (cf. Ladd 2008: 223, 257-259). This explains why focus in prefinal posi- 
tions is associated with the acoustically more prominent nuclear configurations 
LH*L- and L<H* H-. 

The paradigmatic choice between these three and other nuclear configura- 
tions, on the other hand, seems to encode pragmatic meaning at the illocutionary 
level and that of additional non-at-issue meanings: Fliessbach (2023) shows conclu- 
sively that non-at-issue meanings conventionally described as mirativity and obvi- 
ousness, as well as whether an assertion is an agreeing or disagreeing response or 
a neutral provocation, have a clear effect on the nuclear configuration in declara- 
tives even when focus is kept totally constant. A similar intuition also informs the 
data in Table 3 from Hualde & Prieto (2015) and much of the literature contributing 
to it. This means that LH* L-/L<H* H-/L*L% should be seen to signal something like 
neutral or unmarked declarative, to which focus position is orthogonal. 

Based on this discussion, a more appropriate conceptualization for the relation 
between information structure and prosody is the following: the suprasegmental 
realization across the entire utterance cues a prosodic structure and a prominence 
profile (i.e., a metrical structure) that can be brought into relation with informa- 
tion structurally relevant divisions of the utterance. This cueing is asymmetrical 
because it is based on default expectations: as seen above, the expected - unmarked 
case of rightmost prominence at the highest level of metrical structure is often 
not given acoustically prominent pitch cues at all (cf. Calhoun 2010b: 11-13). The 
argument can further be made that even the nuclear accent is truly a metrical-pro- 
sodic category whose association to focus is only preferential, but not categor- 
ical (cf. Ladd 2008: 263-273). Evidence that this is true for Spanish comes from 
Calhoun et al. (2018), with similar arguments made for English in Calhoun (2010b). 
Calhoun et al. (2018) investigate the prosodic and syntactic realization in Venezue- 
lan Spanish utterances (n=651 from 9 speakers from Valera) consisting of intransi- 
tive sentences under different information structural conditions and accounting 


pitch accent on the iP-final word is LH* or L<H* based on peak alignment. While Vanrell & Fernán- 
dez Soriano (2018) opt for the delayed version (L«H*), Gabriel et al. (2011) analyse it as LH*. 
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for the difference between unaccusative and unergative verbs. They performed a 
picture-based elicitation task that allowed speakers to produce utterances freely in 
terms of syntax and prosodic realization. See (32) for examples. 


(32) Unergative and unaccusative Spanish example sentences, after Calhoun 
et al. (2018) 
a. La chica estornudó unergative*! 
“The girl sneezed" 
b. Lachica apareció / Apareció la chica unaccusative 
“The girl left" 


Utterances were coded by the three authors for the position of nuclear stress (i.e. 
highest prominence in the utterance, on either the initial or final of the two lexical 
words) independently of the elicitation context. Acoustic analysis shows that this 
coding corresponds to a clear mean FO difference (final stressed syllable 23 Hz 
lower) in favour of the initial accent in the case of initial stress and roughly equal 
pitch height (final stressed syllable 1 Hz lower) in the case of final stress, but only a 
relatively smaller difference in syllable length (final stressed syllable 50 ms longer) 
for initial stress, as opposed to final stress (final stressed syllable 67 ms longer 
Calhoun et al. 2018: 16). That the coding thus corresponds to such a complex acous- 
tic correlate is in itself evidence that strength relations as modeled via metrical 
structure are above all based on complex, context-dependent expectations (see also 
section 3.3.2). 

In the analysis of their data, Calhoun et al. (2018: 18-20) find that in the overall 
majority (56%), their intransitive utterances have the word order subject-verb ((32) 
a and the first alternative of (32)b) with rightmost nuclear stress. All in all, 1696 
of utterances have the word order verb-subject (the second alternative in (32)b), 
mostly also with final nuclear stress. The remaining 2896 exhibit subject-verb word 
order with initial nuclear stress. These ratios vary considerably across conditions, 
with contexts eliciting a correction (corrective focus) on the subject, those in which 
the subject is focused as answer to the QUD but not in correction (information 
focus), and those asking the question *what happened?" (broad focus) having sig- 
nificantly different rates of nuclear stress on the subject (53.3%, 32.5%, and 12.8%, 
respectively). Note that even in the corrective focus condition, 20.796 of utterances 


81 The classification here is taken from Calhoun et al. (2018: 7-10) and the works their discussion 
is based on. In generative syntax, unergative verbs are those whose single argument is VP-external, 
while the single argument of unaccusative verbs is VP-internal. This has been related to semantic 
verb types, such as ones denoting an uncontrolled process (unergative), or changes of location 
(unaccusative). 


116 — 3 Theoretical background and literature review 


still have subject-verb order with final nuclear stress, i.e. the focal position is not 
marked. For information focus, this number increases to 49%, so that nearly half 
of all occurrences do not mark the focal position (the remaining 26% and 18.5%, 
respectively, have verb-subject order). In contrast, in the broad focus contexts, less 
than 30% (28.8%) of occurrences do not have subject-verb order with rightmost 
stress. Only here, the verb type also accounts for a large and significant difference: 
with unergative verbs in the broad focus condition, 88% are subject-verb with 
rightmost stress, whereas with unaccusative verbs, only 54% are, while 19% have 
the same word order with initial stress, and 28% have verb-subject word order. 
In the condition of information focus, the difference due to verb types is much 
smaller and just about reaches significance (p= 0.04), while it entirely disappears in 
the corrective focus contexts. This means that even though both information struc- 
ture as induced through discourse context and verb type (to a lesser degree) have 
demonstrable effects on nuclear stress placement, there is a clear default to place 
it rightmost, even when this means providing no overt cues to focus position. The 
relationship between prosodic/metrical structure, semantics and syntax in the sig- 
naling of information structure is clearly not categorical, but can only be conceived 
of as probabilistic and heavily shaped by contextual expectations (cf. Calhoun 
2010b; the assumption of a distributional relation between information structure, 
prosody, and syntax is also evident in the use of stochastic OT for their modeling by 
Gabriel 2007). The role of expectations is also implicitly invoked by Calhoun et al. 
(2018: 22) when they say that the corrective focus condition is more often overtly 
cued because it is more informative. Such degrees of informativity can be modeled 
in a discourse model such as the one by Farkas & Bruce (2010), where the assertion 
p of a correction against a proposition q which is already in the commitment set 
of the interlocutor would entail an empty projected set and would therefore be a 
marked move. Since the acceptance of such a move necessitates the removal of q 
from the commitment set of the interlocutor in addition to entering p into CG, it is 
more informative than an assertion that is not a correction. A plausible hypothesis 
is therefore that the more unexpected (i.e. informative) and therefore pragmati- 
cally marked a move is, the more likely it is to also receive a prosodic form that 
is also marked in the sense that it cues a metrical structure that is not the default. 
Interesting further implications arise from the conception of the nuclear accent 
as belonging minimally at the level of the iP, and the distributive relation shaped 
by expectations between metrical/prosodic structure and information structure: it 
makes the prediction that several nuclear accents can stand in a metrical relation- 
ship with each other, and that utterances produced with several of them can still 
be assigned e.g. an unmarked, i.e. rightmost relationship overall, and thus receive 
essentially the same interpretation in terms of information-structural division 
as smaller utterances consisting of only a single iP/IP (see also Ladd 2008: 271). 
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Calhoun (2010b: 6) claims that this is the case for English and uses the constructed 
example given in Figure 16 to demonstrate that what she calls an *emphatic rendi- 
tion”? of Arun bought a Porsche, with each content word produced as a separate 
iP/IP (she assumes only a single phrasal category), can still be an answer to *What 
happened?”, in that bought and a Porsche, each a phrase with a nuclear accent, are 
assigned the relation w-s at a higher node, which then receives the highest prom- 
inence at the highest node, assigning Arun w only at this level. While the idea in 
itself is compelling, it somewhat suffers from being not much more than a thought 
experiment in this fashion. From an intonational perspective, investigating such 
utterances consisting of several iP/IPs is also compelling in terms of questions about 
how prosodic structure might deal with this and what it might mean for the discus- 
sion about recursive prosodic structure (cf. section 3.6). As a first approximation to 
how such multi-iP/IP utterances might actually play out, we can consider the two 
utterances in Figure 17 and Figure 18, which are basically attempts at recreating 
something like Calhoun’s invented example for (Lima) Spanish. 


( Nuc ) 
W s 

( Nuc ) ( Nuc ) 
s w s 
Nuc  Nuc Nuc 


(ARUN) (BOUGHT) (a PORSCHE; )! 


Figure 16: Metrical structure of an ,emphatic rendition* of the 
sentence Arun bought a Porsche, adapted from Calhoun (2010b: 6). 


82 The context Calhoun (2010b: 5) evokes for the constructed example is that „the speakers are 
so surprised they produce every word in a separate phrase“. This is perhaps not fully convincing, 
especially without further intonational specification, but intuitively, I would agree that such utter- 
ances where each or nearly each word is phrased separately, do exist. 

83 The speaker is the same speaker from Lima who also produced the utterances shown in 
Figure 7, Raúl Bendezú Araujo, who was kind and patient enough to record himself speaking them 
according to my instructions, for which I owe him thanks. They are of course totally artificial, but 
only serve to illustrate the problem here. 
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Figure 17: María miró a la gallina amarilla ‘Maria looked at the yellow chicken’, intended as an answer 
to “What happened?”, spoken by a Spanish speaker from Lima, normal rendition. 


Fog) 


Figure 18: María miró a la gallina amarilla ‘Maria looked at the yellow chicken’, intended as an answer 
to “What happened?”, spoken by a Spanish speaker from Lima, “insistent” rendition. 


Both Figure 17 and Figure 18 are productions of the utterance María miró la gallina 
amarilla “Maria looked at the yellow chicken” as answer to the question “What hap- 
pened?”. While in the case of Figure 17, the speaker was told to produce the utter- 
ance as naturally as possible in this context, Figure 18 is the result of being asked 
to produce the utterance to the same QUD, but explicitly with added insistence, 
speaking slowly and hyperenunciating, as if talking to someone hard of hearing or 
slow on the uptake. I have refrained from transcribing pitch accents in Figure 18, 
both because Lima Spanish has not really been given an intonational account in the 
AM framework to my knowledge, apart from the inclusion of equally unanalyzed 
individual examples in the Atlas interactivo de la entonación del español (Prieto & 
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Roseano 2009-2013), and because the example is so obviously artificial.®* It is still 
instructive to make a comparison with Figure 17. There, each content word real- 
izes an LH* pitch accent, and a high rising pitch movement suggests the presence 
of a H- on or after a la, indicating that an iP-level boundary occurs there, while 
finally, a L% can be safely assumed. In comparison, Figure 18 seems to consist of 
a far greater number of phrases, visible not only from the pauses but also from 
the accompanying pitch movements indicating boundary tones, which I have ten- 
tatively transcribed. What is interesting is on the one hand that seemingly each 
single prosodic word (including the clitics a la, which seem to have been promoted 
to prosodic word status) is indeed realized as an individual iP and thus with its 
own nuclear accent, and that on the other hand, prominence relations between 
them can still be made out. The boundary rise delimiting miró is scaled far higher 
than any of the others, suggesting perhaps a stronger boundary there (roughly cor- 
responding to the position of the H- in the “natural” Figure 17). The pitch accent 
on the final word, amarilla, is also quite certainly different from the preceding 
ones, further suggesting a structure above the level of the individual iPs, presum- 
ably an IP, with respect to which this pitch accent is final. Assuming that Figure 
18 is indeed an appropriate answer to the question *what happened?", then this 
incidental comparison does seem to bear out the prediction by Calhoun (2010b) 
regarding the assignment of prominence relations at successive levels above that 
of the one at which the nuclear accent is assigned. In section 5.2 I will introduce 
a set of less artificial elicited Huari Spanish utterances that share the property of 
apparently assigning prominence relations at multiple levels of “nuclear accents". I 
will provide an analysis there for them that connects the indirect relation between 
phrasing, prominence, and information structural categories with the question of 
prosodic recursion in Spanish. 

Broadening the focus somewhat, we can finally turn to Kügler & Calhoun (2020) 
to see how the prosodic cues to information structure found for Spanish relate to 
a wider typological context. They describe three main strategies in which informa- 
tion-structural categories (mainly focus) can be signaled in this indirect fashion via 
prosodic structure crosslinguistically. In our discussion of Spanish we have seen all 
three of them employed, namely prominence (focus seeks to align with the highest 
stress in the phrase which is often realized with the most acoustically prominent 
pitch movement), phrasing (focus seeks to align with prosodic edges) and pitch reg- 
ister (the focus-background division of the utterance corresponds at least partially 


84 We might note that the majority of identifiable pitch accents seem to be falling, perhaps in- 
stances of HL*, which occurs as part of HL* L% in Table 3 with the label “insistent explanation/ 
insistent request". 
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to an asymmetrical difference in pitch register and span, in that postfocally, pitch 
movements are reduced and occurring at a lower register). Languages seem to differ 
in the degree to which they prefer either of these strategies and implement them. 


3.7.3.2 (Cuzco) Quechua 

In the case of Quechua, very little research has been done regarding prosodic cues 
for information structure. Cole (1982: 210-211) describes for Imbabura Quechua 
(Quechua IIB, Ecuador) that there is a single main intonational peak in an utterance 
and that it is normally located on the final word, followed by a final fall independ- 
ent of whether the utterance is a declarative, polar interrogative, or wh-interrog- 
ative. The main peak can move to a non-final word if that word is *emphasized or 
contrasted". If an utterance consists of several breath groups (i.e. smaller phrases), 
they can each bear a main peak. For Cuzco Quechua (Quechua IIC, southern Peru), 
Cusihuamán (2001: 79-81) does not mention a similar shift in the position of a main 
pitch peak according to the position of what is felt to be the “contrasted” word; he 
only describes pitch movement on the last two syllables of the utterance (using 
a four-level system), indicating that while declaratives, interrogatives, and imper- 
atives have a final fall (with different pitch spans), exclamative utterances have 
elevated pitch only on the final syllable and end high. These affirmations are based 
on impressionistic data; the only works that I am aware of that deal with prosodic 
cues for information structure on an instrumental basis are again concerned only 
with Cuzco Quechua. O'Rourke (2005: 61-68) tentatively proposes that there are 
no prosodic cues to information focus® in declaratives, neither in terms of peak 


85 Her criterion for whether a constituent is in focus is that it is non-initial and marked by the 
evidential suffix -m/mi, after a proposal by Muysken (1995) that the evidentials —mi/-chi/-shi mark 
focus on the constituent they attach to (in initial position he states that they can also have scope 
over the entire clause). It seems however, that the relationship between the position of focus and 
the presence of the evidentials can at most be characterized in such a way that if there is an evi- 
dential in a sentence, the constituent it attaches to is focal, but not the other way round: this much 
is clear from the frequent attestation of sentences without any evidentials. The characterization in 
Weber (1989: 427—429) for Huallaga Huanuco Quechua (Quechua I, central Peru) is more cautious: 
he asserts that broadly, what parts of a sentence are thematic (i.e. topical), and which are rhematic 
(i.e. focal), can be determined from the interplay between the distribution of the “topic marker" 
-qa, the evidentials, and the position of the verb. Optional -qa-marked initial constituents are the- 
matic, followed by a rhematic part which may contain constituents marked with the evidentials, 
and then the verb, followed optionally by further —qa-marked thematic constituents. He explicitly 
warns against simply identifying the evidential-marked constituent as the last or first rhematic 
one and also attests sentences with more than one evidential, which is ungrammatical in Muysken 
(1995: 381—382). This suggests a relationship (perhaps dependent on the language variety) between 
focus position and evidential marking that is quite similar to the distributional relationship be- 
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alignment nor of postfocal downstep or deaccentuation, but stresses that further 
research is needed because her analysis is based on only a small number of individ- 
ual examples. O'Rourke (2009) somewhat qualifies this assessment. She proposes 
a regular LH* pitch accent?9 on stressed syllables (taken by her to be the penult 
in Cuzco Quechua) in declaratives, combined with iP-initial L- and optional iP-fi- 
nal H- in non-final, and iP-final L-, in utterance-final iPs (O' Rourke 2009: 308-309). 
Overall, peaks are aligned within the tonic syllable and on average show downstep 
across the utterance, but based again on the observation of individual utterances, 
this downstep pattern is said to be overruled by the focal constituent (again iden- 
tified via the presence of —n/mi marking) having the highest peak in the utterance, 
suggesting highest prominence (O'Rourke 2009: 302—304, 307—308). Again, no evi- 
dence for postfocal deaccentuation or compression is found. 

I want to conclude the section on prosodic cues and information structure by 
discussing in some more depth Muntendam & Torreira (2016), a study that to date 
is unique not only in that it experimentally investigates Quechua prosody under 
different information structural conditions, but also in providing comparative data 
from two Spanish varieties. Because their findings are so relevant for the present 
work, I will also address some important methodological shortcomings, but it 
should be clear that their pioneering contribution is extremely valuable to the aims 
of my own study because in many aspects it represents the only comparable work 
on at least related varieties. Muntendam & Torreira (2016) investigate the effect 
of information structure on prosody in Cuzco Quechua and Cuzco Spanish by the 
same bilingual speakers (16 speakers), and *Peninsular" Spanish (7 from Castile 
and Leon, 1 from Murcia), using a question-based task to elicit short utterances 
mainly of a noun phrase made up of an adjective and a noun in contexts of broad 
focus, contrastive focus on the noun, and contrastive focus on the adjective. Speak- 
ers participated in pairs and were given a stack of cards containing preformulated 
questions and coloured objects from which answers were to be built. As can be seen 
from (33)b) and c), *contrastive focus" on the adjective and the noun were elicited 
by asking a polar question in which the other element in the noun phrase was given 


tween focus and prosodic cues argued for here. Inarguably, the evidentials and other morphosyn- 
tactic devices in Quechua interacting with focus all convey an additional (paradigmatic) meaning 
that is orthogonal to their use as focus markers, see e.g. Faller (2002, 2003, 2014); Behrens (2012); 
Bendezü Araujo (2021) for accounts. 

86 Actually she proposes an L*H for pitch accents in prefinal, and an LH* for those in final position 
in the utterance, based on peak alignment data. However, she also admits to the possibility that 
since peaks are nearly always realized within the stressed syllable in all positions, the pitch accent 
could also be taken to be LH* in general, and the alignment difference as phonetic (O'Rourke 2009: 
309, note 16). The classification is clearly tentative and awaits further research. 


122 —— 3 Theoretical background and literature review 


and correct, while the adjective or the noun, respectively, did not correspond to the 
object shown on the card for the answering speaker and were thus intended to be 
corrected?" in the elicited assertions. Broad focus was elicited by asking a wh-ques- 
tion about what object the answering speaker had on their card ((33)a)). 


(33) Example elicitation dialogues in Spanish, from Muntendam & Torreira 


(2016: 75) 
a. Q: ¿Qué tienes? 
*What have you got?" 


A: Tengo una luna morada 
*[ have a purple moon" 

b. Q: ¿Tienes una flor morada? 
*Have you got a purple flower?" 
A: No, tengo una luna morada 
“No, I have a purple moon” 

c. Q: éTienes una luna negra? 
“Have you got a black moon?” 
A: No, tengo una luna morada 
“No, I have a purple moon” 


For each language variety, Muntendam & Torreira (2016) identify several attested 
intonational contours in the responses and their frequency of occurrence?? across 


87 Note that according to Farkas & Bruce (2010: 96), polar questions such as those asked in (33) 
b) and c) are unbiased with regards to their response (whether the proposition asked for is true 
or not). The answers elicited here thus constitute reversals, which are different from denials in 
the sense that no commitment has been made in the provocation. It could thus be argued that 
the difference between the *corrected" and the *uncorrected" element in the responses is simply 
one of relative givenness, which is very variably cued prosodically across languages (Cruttenden 
2006; Calhoun 2010b). However, it could also be considered that these responses form a particular 
subclass of reversals that might be called partial reversals, in analogy to the partial denials that are 
described as possible responses to assertions (Farkas & Bruce 2010: 99-100). 

88 Example elicitation dialogues for Cuzco Quechua are not provided in Muntendam & Torreira 
(2016). They would have been interesting with regards to possible differences in the placement 
of the polar question marker —chu (-ku in Conchucos Quechua) which can attach to different con- 
stituents and possibly be used to mark the question focus (specify the QUD with respect to which 
constituent it asks about, cf. O' Rourke 2005: 183). As far as I can tell, the intonational form of the 
elicitation question, produced by the participants themselves, was not controlled for. This applies 
also to the Spanish part of the experiment, for which it has been shown that the intonational form 
of the question can have a significant influence on the form of the response (Fliessbach 2023). 

89 Originally, speakers produced 20 target utterances per condition, yielding 960 utterances in 
bilingual Cuzco Quechua and Spanish each, and 480 utterances in Peninsular Spanish (Muntend- 
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the experimental conditions, reproduced together in Figure 19. For Peninsular 
Spanish, the results fall well inline with previous findings as well as with the theory 
of an indirect relationship between information structure and prosodic cues. 

As can be seen from Figure 19, all three attested contours?? for Peninsular 
Spanish occur in all three experimental conditions. Only tendencies can be made 
out in both directions of association: neither is an experimental condition exclu- 
sively linked to a single contour, nor an observed contour exclusively occurring in 
only one condition (with the (c) contour in the ContrN condition however coming 
closest). The same observation can be made correspondingly for the other two 
language varieties, making a direct encoding of information structure via prosody 
unlikely (Muntendam & Torreira 2016: 78, 84-85). The three contours identified 


am & Torreira 2016: 75). For all languages, utterances including hesitations or longer pauses were 
excluded. Only for the Quechua data, all utterances containing case markers and all utterances 
in which the target NP containing the adjective and the noun were not utterance-final were also 
excluded, because “this is the natural position for the corresponding NPs in Spanish" (Muntendam 
& Torreira 2016: 76-77). For Peninsular and Cuzco Spanish, the elimination resulted in 396 and 
600 remaining utterances for analysis respectively, but in the case of Quechua, only 227 utterances 
were included in the final analysis, a reduction by more than 7596. Assuming that a similar amount 
of elimination due to hesitation took place in the Quechua data and the Cuzco Spanish data, leaving 
some 600 utterances, then still more than 62% of the remaining data must have been eliminated 
only due to the presence of case markers or non-NP-final word order. It is difficult to assess how 
likely the utterances were to include case markers, because no Quechua example sentences are 
given in the paper and several constructions are possible in Quechua to convey the content of the 
examples in (33), some of which would not regularly have to contain any case markers at all. On the 
other hand, it can be assumed that the target utterances in Quechua were as simple as the Spanish 
ones, effectively consisting of only a verb and the target NP. That means that whether the target 
NP is final comes down to a binary option, and potentially the excluded occurrences represent the 
unmarked majority option, in which the verb is final. This issue is not addressed at all in the paper, 
but eliminating the word order option that represents the majority of cases should be a cause for 
concern. As it stands, there are already a number of studies about word order in Quechua, includ- 
ing the Cuzco variety, that all point to there existing a relation between word order and informa- 
tion structure, and a trend for verb-finality (Wólck 1972; Weber 1989; Muntendam 2010; Sánchez 
2010). In the light of these assessments, it seems likely that the order V-NP is itself a marked word 
order and/or a relevant cue to information structure. The decision to eliminate it from the analysis 
perhaps thus means that data from a marked word order in Quechua is compared to that from an 
unmarked word order in Spanish. If this is the case, it could cast doubts on both the comparability 
of the findings on Quechua to those on the Spanish varieties (internal validity) and the possibility 
for generalization of the Quechua analysis beyond this sample of data (external validity). 

90 Muntendam & Torreira (2016: 76) do not provide any details on how the contours were iden- 
tified, and how ambiguous cases and disagreements in annotation were handled. They do assert, 
however, that the ToBI-style pitch accent and boundary tone labels “only serve [. . .] the practical 
purpose of distinguishing the contours" (Muntendam & Torreira 2016: 77, note 5) in their data, 
presumably as opposed to representing a fragment of grammatical analysis. 
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Figure 19: Schematized versions of attested intonational contours for noun phrases consisting of 

a noun and an adjective, una gallina rosa *a pink chicken"/ rosado nina "pink fire", from Peninsular 
Spanish (a, b, c), Cuzco bilingual Quechua (d, e), and Cuzco bilingual Spanish (b, d, e). Note the 
reversed word order between the two languages. Tables give number of occurrences of the identified 
intonational contours, per language and experimental condition, all adapted from Muntendam & 
Torreira (2016: 77-78, 81, 83). 
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for Peninsular Spanish are familiar and in broad agreement with the literature 
already discussed above. The (c) contour in Figure 19 is easily identified as the 
typical Spanish declarative nuclear contour in which the last word does not form 
an observable pitch peak at all, leading to an interpretation of final prominence by 
virtue of the expectational bias in that direction. Its status as an unmarked default 
is here confirmed by the fact that it is not only the most frequently used contour 
in the broad focus condition, but also in the condition with contrastive focus on 
the (prefinal) noun, and the most frequent overall (48% of all cases). Both other 
contours are arguably prosodically more complex because, if we equate the (a)-con- 
tour with the LH* L- contour from Gabriel (2007); Vanrell & Fernandez Soriano 
(2018) and others, they involve an additional division of the IP into two iPs. The 
(b)-contour with the high prefocal iP-boundary tone is significantly more frequent 
in the condition with contrastive focus on the final adjective than the (c)-contour 
(Muntendam & Torreira 2016: 78), and the (a)-contour is clearly used most fre- 
quently (but not exclusively) in the condition with contrastive focus on the prefinal 
noun. Crucially, the relatively fewer occurrences of the (a)-contour in the condi- 
tion of contrastive focus on the prefinal noun (33/130 occurrences, or 25%, in the 
ContrN condition) compared to the (b)-contour in the condition of contrastive focus 
on the final adjective (74/134 occurrences, or 55%, in the ContrA condition) cannot 
be explained via pragmatic or information structural markedness or degrees of 
informativity. In both conditions, focus is “contrastive” in the same way, but the 
(a)-contour is additionally marked in that the nuclear accent is not rightmost in the 
IP, i.e. the differential in the rate of occurrence can be explained when we make 
reference to the prosodic and metrical structure independently of the experimen- 
tal conditions standing in for information structure. The expectation-based default 
of rightmost highest prominence seems strong enough to prevent a contour from 
being realized that would align focus position with a prefinal highest prominence 
in the majority of cases. The flipside is that this naturally makes an occurrence of 
such a contour that much more markedly informative, as already suggested in the 
discussion of Calhoun et al. (2018). These results thus provide further evidence for 
the independent relevance of prosodic and metrical structure, and the proposal 
about the nature of its relationship to both phonetic cues and information struc- 
ture, which should be characterized as indirect: probabilistic, which is to say that 
from the perspective of an individual event, it is only possible to say that a certain 
information structural configuration will result in one of several realizations with 
a certain likelihood; and distributional, which is to say that only when observing 
many events can an association between an information structural configuration 
and a certain realization be made out, and intervening factors identified, in the 
form of trends in the distribution (cf. Calhoun 2010b). 
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Broadly, the same generalizations can be drawn from the Cuzco Quechua 
results in Figure 19: both contours are used in all three conditions, but there is a 
preference for the contour with clearer rightmost prominence (e) in the conditions 
in which focus is broad or rightmost (on the noun). Effectively, there does seem 
to be a tendency to differentiate between prefinal and final prominence, but no 
additional differentiation corresponding to the information structural division of 
broad vs. narrow final focus, as achieved in Spanish through the preferential use 
of contour (c), without iP-phrasing within the IP, vs (b), where the given material 
is phrased off with a H-. An open question is whether this is due to Cuzco Quechua 
not differentiating between broad and final narrow focus, or because this phras- 
ing option is not available, or not used to separate given from new material. The 
Quechua results are also interesting in further ways. Both contours, (d), and (e), 
are at odds with the analysis of Cuzco Quechua declarative intonation in O'Rourke 
(2009). There, only a bitonal rising pitch accent LH* (see note 85) is proposed, but 
here the contours include the two monotonal pitch accents L* and H*. In addition, 
the high beginning in the (d) contour is also incompatible with the analysis in O’Ro- 
urke (2009: 304—305, 308-309), where a low boundary tone L- is proposed to be 
initial in every phrase. This suggests either that one of the analyses is incorrect, or 
that the contours in Muntendam & Torreira (2016) are not full contours in that they 
only characterize partial phrases. The discrepancy cannot be entirely explained 
away by saying that O'Rourke (2005, 2009) does not cover cases of contrastive focus; 
the L- LH* LH* L96 contours we would expect from her analysis also do not occur 
here in the broad focus condition. It should be noted that since low pitch accents 
are notoriously difficult to identify and not much is known about how obligatory 
pitch accentuation is in prefinal position in Cuzco Quechua, the analysis of the L* 
in the (e)-contour is especially worthy of future investigation. 

Cuzco Spanish, finally, makes use of contours (b), (d), and (e), also in general 
supporting the hypothesis about an indirect, distributional relationship between 
information structure and prosody because all contours occur in all conditions, 
with the difference in preference for (d) in the broad focus condition vs the pref- 
erence for (b) in the condition with contrastive focus on the final adjective found 
to be statistically significant, but not the difference between the two contrastive 
conditions (Muntendam & Torreira 2016: 82-83). That is to say, Cuzco Spanish 
seems to mainly make a difference between a simple IP and one in which given 
material is phrased off in a separate iP. Based on the presence of contours (d) and 
(e) in both Cuzco Quechua and Spanish, Muntendam & Torreira (2016: 86), claim 
that this is evidence for cross-linguistic influence from Quechua to Spanish and 
even that this influence is *unidirectional: Spanish adopts prosodic features from 
Quechua but not the other way around". I would argue that their own evidence 
does not support this assertion, and that more research on the intonational phonol- 
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ogy of Cuzco Quechua, but also on neighbouring varieties of Quechua or Spanish, 
is needed before any conclusions about directionality of influence can be drawn. 
Firstly, given the lack of a full intonational analysis of the contours in Quechua, 
with the proclaimed use of the ToBI-like labels solely for “the practical purpose of 
distinguishing the contours” (Muntendam & Torreira 2016: 77, note 5) but no infor- 
mation provided on the criteria for contour identification, comparing Quechua 
contours with Spanish ones cannot go beyond establishing a superficial phonetic 
similarity. Secondly, even if that issue were settled, there are no grounds for claim- 
ing that the (d)- and (e)-contours are in any sense “original” in Cuzco Quechua, 
and only “adopted” in Cuzco Spanish. The only statement supported by the facts 
is that the Cuzco speakers seem to use two of the attested contours in both of the 
languages spoken by them, and one in only one of them. The two “shared” contours 
might just as well “originate” from their use of Spanish and have made its way into 
Quechua, or be a shared innovation of the speaker community transgressing lan- 
guage boundaries, in the way of “diasystematic constructions” (Héder 2014a, 2018) 
in prosody, since virtually nothing else is known about the prosody of neighbouring 
varieties, of Quechua or Spanish. 

In sum, even though considerable issues remain, Muntendam & Torreira (2016) 
make important headway into the study of prosody and IS in Cuzco Quechua. It 
seems that some of the same conclusions on the relation between information 
structure and prosody can be drawn as for Spanish: it appears to be distributional, 
insofar as there is not a categorical mapping between a contour type and a focus 
type.?* Whether it is also indirectly mediated via metrical structure remains yet to 
be seen. For Conchucos Quechua, no comparable study exists, and not much more 
than basic facts of its prosody are known. In section 6.1, I will develop an account 
of Huari Quechua prosodic and intonational structure based on quantitative and 
qualitative data that takes some elements from O’Rourke’s (2009) analysis, espe- 
cially her use of initial and final iP-boundary tones, but greatly reduces the role of 
word stress and pitch accents. I will furthermore lay out how this proposal relates 
to information structure in sections 6.2 and 6.4, and develop an OT-model of intona- 
tion that describes a prosodic variation space between the attested forms of Huari 
Spanish and Quechua in sections 5.3 and 6.3. The insights gained there might even 
help shed some light on the outstanding issues in Cuzco Quechua and Spanish. 


91 Roessig (2021: 81-87) also comes to the conclusion that no one-to-one mapping between focus 
types and pitch accent types seems to exist in West Germanic languages. Evidence from other stud- 
ies considered there indicates that the distribution of continuous parametres like relative peak 
alignment and tonal onglide interact with the distribution of categorical pitch accent types across 
focus types to ensure that they can still be recognized correctly in perception. 


4 Refined research questions 


Before moving on to the actual analyses of Huari Spanish and Quechua intonation, 
this interim chapter serves to refine and expand the research questions based on 
the background provided in the two previous chapters. In the introduction, three 
broad leading questions (1), (2) and (3) were given, here repeated as (34)-(36) with 
their subquestions. 

In large parts, this is an exploratory study on the prosody of two undescribed 
varieties of Spanish and Quechua, so even broad answers to these two questions 
should fill a research gap. However, based on the preceding theoretical discussion, 
all three of them can be further refined. Specifically, we can ask the expanded 
version of (1), (34), of both Huari Spanish and Quechua. 


(34) Whatare the relevant properties of the intonational systems of Huari Spanish 
and Huari Quechua? 

a. What is their tonal inventory? Are tones only edge-seeking or can some 
be identified as pitch accents associated with a metrically strong position 
as well? How many options are available at each position? 

b. What evidence is there for which levels of prosodic structure? 

c. How are tones distributed across the units of the prosodic structure? 


In particular for Quechua, one very important question regarding word prosody 
needs answering, without which the other questions cannot fully be answered. In 
the previous chapter, we saw that the evidence for word stress in Ancash Quechua 
varieties is mixed at best (section 3.3.3), especially when adopting a definition of 
word stress as culminative and obligatory, following Hyman (2014). Based on this 
discussion, the following question can be asked: 

d. What evidence is there that pitch events in Huari Quechua are affected 
by a metrically strong position at the word level that falls under Hyman's 
definition of word stress? In particular can stress-sensitive pitch events 
be disentangled from edge-seeking ones based on evidence from tonal 
alignment? 


This is a question tailored to Huari Quechua, but it will be useful to compare Huari 
Quechua here to Huari Spanish, for which the assumption of word stress is much 
less controversial. 

Regarding (2), it is in a sense itself a subquestion to (1), so its expanded ques- 
tions (35) also can be brought to bear on answering the overall question which 


3 Open Access. © 2024 the author(s), published by De Gruyter. | C9 EZITZTEI| This work is licensed under the Creative 
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prosodic properties are relevant. In section 3.7 the scope of this question was 
restricted in that mostly, only those kinds of discourse meanings that are collected 
under the header of information structure are in focus here. However, other kinds 
of discourse meanings and their cues also have to be taken into consideration to 
some extent in order to disentangle the cues for information structural meanings 
from them. 


(35) How and what kinds of interactional/discourse meaning do they encode? 

a. Towhatextent are paradigmatic tone contrasts (different boundary tones 
and/or pitch accents, different nuclear configurations) used to encode 
discourse meanings? 

b. To what extent are syntagmatic tonal devices (phrasing via edge tones, 
deaccentuation/dephrasing etc.) used to encode discourse meanings? 

c. Areprosodic cues used on their own, or in conjunction with, or vicariously 
for, other cues to discourse meanings (word order, morphology, particles)? 

d. Can the role of tonal scaling be shown to be restricted to a *phonetic" 
scalar encoding of *emphasis"? Does it contribute to the paradigmatic tonal 
inventory, or is it used nonlocally to cue prosodic structure and discourse 
meaning, as described for a number of languages (cf. section 3.6)? What 
does this mean for whether the prosodic structure itself is recursive or not? 


Along more theoretical lines, answering these questions will also shed light on 
a question that emerged in section 3.7.3, but whose complete answer this thesis 
cannot provide. 

e. Do the identified cues exhibit a direct or even biunivocal relation to the 
proposed categories of information structure and other discourse mean- 
ings, or is it more intermediate and distributional, like discussed in 
section 3.7.3? Does this have effects on the conception of these categories 
themselves, and what role does context play? 


Finally, (3) can also be expanded, as (36). 


(36) Which ofthese properties are specific to onelanguage, and which are perhaps 
common to both? 

a. Based onthe answers to (1)/(34), do the differences between Huari Spanish 
and Quechua occupy neatly definable positions along the typological 
dimensions laid out in section 3.4? 

b. Dotheir variation spaces overlap and how can this be shown? 
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c. Especially assuming that a considerable amount of variation will be 
found, what does this mean for prosodic typologies and their proposed 
objects, languages? 


Not all of these questions will be answered to equal degrees of depth across both 
languages. (34)d is clearly more aimed at Huari Quechua. In contrast, (35)d will 
be more thoroughly explored for Huari Spanish than for Quechua. As the whole 
study employs various methods, the type of evidence brought to bear on each 
individual question also varies. Broadly speaking, sections 5.1.1 and 5.1.2, and 6.1 
aim at answering the first group of questions (34) for Huari Spanish and Quechua, 
respectively, with 6.1.6 particularly concerned with answering (34)d via a quanti- 
fied comparison of peak alignment in Huari Quechua and Spanish. Sections 5.1.3 
and 5.2, and 6.2 and 64, respectively, seek to provide answers to the second group 
of questions (35) for Huari Spanish and Quechua, with 5.2 and 6.4 both also particu- 
larly concerned with (35)d, but using different methods. I’m using the Spanish and 
Quechua OT-analyses in sections 5.3 and 6.3, respectively, as well as the concluding 
section 7.4, to tackle the third group of questions interested in the interrelationship 
of the prosodic systems (36). However, since the overarching goal of the study is to 
provide descriptions of the prosodic systems of the two languages, not each ques- 
tion has its own section where it is answered. Rather, the answers to the individual 
questions will result from the analyses as a whole. 


5 Huari Spanish 


This chapter is concerned with the description and analysis of the main intona- 
tional phenomena of Huari Spanish. It gives an overview over how intonation in 
Huari Spanish varies with pragmatic factors like utterance type and information 
structure. In the first section (5.1), the description is mostly based on relatively 
simple utterances. It will be established that in the majority of Huari Spanish 
declarative utterances, each accentable word forms an LH* pitch accent, also in 
phrase-final position. The pitch peak that is formed occurs within the stressed syl- 
lable, also in prenuclear accents. Utterances conforming to this description will be 
said to make up the *main" intonational variant of Huari Spanish. Besides describ- 
ing the nuclear configuration also for interrogatives and documenting some more 
marginal phenomena, an important contribution of this chapter is the description 
of a variant accentuation behaviour for declaratives, encountered in a number of 
utterances, in which a single right-aligned rising (LH) or rising-falling pitch event 
(LHL) takes place over a number of accentable words. This intonational variant 
will be called *phrase accentuation" in order to separate it from the *main" variant. 
Possible factors explaining its occurrence will be explored. 

In the second part of the chapter (5.2), a particular set of complex utterances 
from Elqud consisting of two topics and two comments will be described. They will 
be argued to present evidence for pitch scaling used to cue a hierarchical prosodic 
structure nonlocally and for a recursive prosodic structure, providing an answer to 
research question (35)d regarding Huari Spanish. In addition, a sizeable number of 
utterances of this type displaying the “phrase accentuation" will be analysed sepa- 
rately and the resulting insights then lead to the final analysis in the third part of 
the chapter. 

There, some of the observations from the preceding two parts are formalized 
to produce an OT-analysis that allows for an understanding of the different into- 
national variants as the instantiation of a cluster of values that variable prosodic 
properties can assume. This will lead to a conception of how they can be related to 
each other as well as to the variants described for Huari Quechua. 


5.1 Simple utterances 


In this first section on Huari Spanish, a number of intonational phenomena will be 
described based on relatively simple utterances from Conc, Maptask, Cuento, and 
Elqud. I begin with declaratives (5.1.1), move on to interrogatives (5.1.2), and then 
discuss meaning-related aspected of prosodic variation (5.1.3). 


[o] Open Access. © 2024 the author(s), published by De Gruyter. [CBSE This work is licensed under the Creative 
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5.1.1 “Main variant” neutral declaratives 


This section describes the most frequent intonational behaviour for declaratives, 
what I will call the “main” variant of declarative accentuation. I will first give a 
qualitative description with examples and then present quantification results over 
a subset of the data. 


5.1.1.1 Introductory description 

In the majority of declarative utterances in Huari Spanish, each or nearly each 
content word is accented on its stressed syllable (see (37)-(40) and the correspond- 
ing Figures 20-23). Peaks are formed on and within each accented syllable, both if 
they are final and if they are prefinal in the phrase or the utterance, independent of 
whether a high or a low boundary tone follows. In continuation rises (with a high 
boundary tone, annotated as H- or H% in ToBI), when the last accented syllable 
before the phrase boundary is not also the phrase-final syllable, the pitch move- 
ments of the pitch accent and the phrase-final rise can very often be clearly distin- 
guished (cf. grande and the two first instances of gigante in Figure 22 and olla in 
Figure 23). Again in the overwhelming majority (see also next section), the peaks 
that are observed on accented syllables are all aligned well within the stressed syl- 
lable, also on pre-final and proparoxytone final words, unlike in peninsular varie- 
ties of Spanish, but similar to what has been reported for Cuzco and Lima Spanish 
by O'Rourke (2005). Final low boundary tones (L%) realize their targets immedi- 
ately following the peak on the final accented syllable, even if the final word is a 
proparoxytone and thus in principle providing some space for the realization of the 
boundary tone (cf. Figure 23). The final boundary tone is mostly not deleted even in 
final oxytones, realizing a full low target (cf. Figure 21), although it might be trun- 
cated insofar as it does not reach the same level as other low targets after a very 
high final peak (cf. Figure 20). Peaks are preceded by clear troughs or valleys that 
often extend from directly after the last accented syllable and which are only elim- 
inated under strong time pressure conditions. These low stretches are taken to be 
evidence of low tonal targets realizing an L tone, just as the peaks are high targets 
realizing a H tone. It seems that in the majority of utterances, the pitch peak in the 
stressed syllable is reached roughly in the middle of the vowel, so that at the end of 
the stressed syllable, pitch has already slightly fallen from the local maximum. The 
elbow marking the end of the trough preceding the peak usually occurs in the syl- 
lable before the stressed syllable, or at the latest right at the beginning of it. These 
observations hold if no other tonal event is encroaching too closely (less than two 
syllables away) upon the peak, and if no particular pragmatic conditions obtain 
that seem to effect a divergent target placement (see section 5.1.3.3 below) and if 
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not another, more edge-oriented mode of accentuation is employed, either also for 
pragmatic reasons or due to interspeaker variation (see section 5.1.3.1). The general 
analysis for Spanish declaratives is therefore here that each tonal movement 
related to an accent, also in final position, consists of an LH tone sequence, associ- 
ated as a pitch accent annotated in ToBI as LH*, with the high target aligned within 
the stressed syllable and the low target directly before it, creating low stretches to 
the left. This is exemplified on typical examples like (37)-(40) below:?? 


(37) ZR29 Cuento ES 0083 

después de un rato llega un colibrí que le dice al oído que su nieta le necesita 
LH* LH*LH* LH* LH*H- LH* LH* LH* LH* L96 

“after a while a hummingbird comes that tells him in his ear that his granddaughter 

needs him" 


(38) TP03 Cuent ES 0513 
tan grande como el gigante que le di- dió miedo al gigante y el gigante huyó 

LH* H- LH* H- LH* LH* LH* H- LH* LH* L96 
“as big as the giant so that it gave the giant a fright and the giant fled" 


(39) KP04 MT ES 0799 
por eso p(u)e(s) le hago un círculo hacia abajo 
LH* LH* LH* LH* L96 
“right because of that Pm making a circle around it downwards" 


(40) ZZ24 MT ES 1269 
delaolla debajo del murciélago 
LH*H- LH* LH* L96 
“from the pot, under the bat" 


92 Stressed and pitch accented syllables are given in bold, with the pitch accents aligned under 
them. Stressed but not accented syllables are given in italics. Indefinite articles (un, ur/a/o) are 
treated as regularly accentable, i.e. stressed even if not pitch accented because in the data here they 
often do realize an identifiable pitch accent, unlike the definite articles (el, la, los, las), which are 
never accented or stressed. Cf. Quilis (1993: 390—395); Hualde (2009), where the same distinction 
is made. 


134 — 5 Huari Spanish 


FD (H2) 


- MA i P: ae mt i 


(Hey 


P 


Figure 21: TP03_Cuent_ES_0513* (main accentuation declarative). Cf. (38). 


The placement of the low tonal targets allows us to surmise both about the prosodic 
structure of accented and unaccented words and the spreading behaviour of the 
tones involved. In all of the examples discussed here, we see that the transition 
from a high target realizing the H tone in an LH* or a H- continuation rise to the 
upcoming low target realizing the leading tone of the following LH* suggests left- 


93 Visualizations of Huari Spanish and Quechua examples in the figures consist minimally of a 
pitch track and spectrogramme, with a time-aligned transcription. For the Huari Spanish examples, 
the transcription is orthographic and in at least two tiers. The first tier is segmented according to 
word boundaries, the second gives the boundaries of stressed syllables. 

94 The pitch object for this picture was created with a voicing threshold of 0.45 and some distur- 
bances were manually removed. 

95 https://osf.io/vyxtp/ 

96 https://osf.io/ypmb4/ 
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p 11.1 MR RIPE 
bacis 


Figure 22: KPO04 MT ES 0799? (main accentuation declarative). Cf. (39). 


7 p m—— i san n ts voti, 
WU TB wat 


MW 


(^o 


Wy ty 
MEUN Mt 


if t , — 


Figure 23: 7724 MT ES 1269? (main accentuation declarative). Cf. (40). 


ward spreading or multiple alignment of this low target. This is especially evident 
after the continuation rises in Figure 21, which are followed by an abrupt reset to 
low, even though several syllables intervene between the next accentuation event, 
in principle leaving time to reach the next low target more gradually. The same 
leftward encroachment of the low stretch is also seen in Figure 23 at the end, after 
the stressed syllable of the final proparoxytone murciélago. Since this final L can 
be taken to belong to the end of the prosodic constituent, the IP, of which it is the 
boundary tone, its relatively early realization should be seen as multiple alignment 
or leftward spreading of the low target. This holds also for the behaviour of the 
L tone in the LH* pitch accents: as leading tones they belong to their H*s, but the 


97 https://osf.10/87j56/ 
98 https://osf.io/hv4ab/ 
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low stretch formed by them is evidence that they seek to align in both directions. 
They often form an elbow before the rise to the high target immediately before the 
stressed syllable. The question remains how far the low target extends leftwards. 
The difference between Figures 22 and 23 is helpful here. While in Figure 23, the 
target of the final L% is realized immediately following the stressed syllable of the 
final murciélago, in Figure 22 there is a much gentler drop after the stressed sylla- 
ble on the equally proparoxytone círculo, reaching the low target only at the begin- 
ning of the next word, the unaccented hacia. These and similar cases in the data 
suggest that the low stretch preceding a pitch accent is blocked both by a preceding 
high tone and the boundary of a prosodic constituent of the same level or higher 
than the prosodic word. For this proposal to work, it has to be assumed that unac- 
cented words form a prosodic word together with the next accented word to their 
right, but not to their left. 

I call the accentuation behaviour whose properties were just described the 
*main" variant of Huari Spanish, both because of its frequency (see next section) 
and its similarity to other varieties of Spanish. In regularly realizing a rising accent 
also on the final accented word in an utterance, the data here not only resemble 
those described for other varieties of Peruvian Spanish, but also for central Mexican 
Spanish (De-la-Mota et al. 2010), where a so-called *cirumflex contour", similar to 
what is described here, also most frequently forms the nuclear pitch accent in con- 
texts of “broad focus". This is in contrast to findings on Madrid Spanish, where 
“broad focus" contexts have been found to often correlate with a low nuclear con- 
figuration (L*L%; cf. Estebas Vilaplana & Prieto 2010; Hualde & Prieto 2015). For 
Mexican Spanish, the L*L% configuration is reported to be only a minor variant, 
and in our data, as the next section will show, it is virtually nonexistent in such con- 
texts. Another difference concerns peak placement. According to De-la-Mota et al. 
(2010: 324), the peak in the final accented word is aligned at the end of the stressed 
syllable in Mexican Spanish, whereas here it is aligned in the middle ofthe stressed 
vowel. Mexican Spanish also regularly has the delayed prenuclear pitch accents 
transcribed as L*«H*, which are exceedingly rare here. Thus while the *main" 
variant is similar to other intonational varieties of Spanish in its very regular pitch 
accentuation of accentable words, it also differs in several aspects from them. 


5.1.1.2 Quantitative results on accentuation in data from Conc 

This section provides a quantified perspective on the frequency of the central com- 
ponents ofthe *main" variant, i.e. that each accentable syllable realizes an LH* pitch 
accent whose peak is aligned within the stressed syllable. It is based on a subset of 
the Huari Spanish data. Annotations and measurements were made on all content 
and all polysyllabic function words from seven corpora from the Conc game, by 
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the speakers TP03 & KP04, QZ13 & OZ14, SG15 & QF16, AZ23 & ZZ24, ZR29 & HA30, 
XU31 & 0A32, and XQ33 & LC34, to provide quantifiable results about how often 
words are pitch accented and the frequency of the LH* pitch accent. These Conc 
corpora consist almost exclusively of declaratives. Content words were taken to be 
nouns, adjectives, adverbs, and verbs (except the two copulas ser and estar); func- 
tion words were all others (articles, prepositions, demonstratives), with the token 
majority of polysyllabic function words in Conc consisting of deictic expressions 
such as ahi, alld, acd etc. The two copular verbs ser and estar were taken to be func- 
tion words, so only polysyllabic forms of them were considered. Apart from mono- 
syllabic function words, words were also excluded due to noise or when their pitch 
track was otherwise very fragmented. Words were counted as being accented with 
a pitch peak when the highest measured pitch in the sonorant part of the stressed 
(tonic) syllable was at least 7 Hz higher” than the lowest measured pitch in the son- 
orant part of the rhyme of the pretonic, or in the sonorant part of the posttonic. Only 
if the pitch range difference to the pretonic obtained was the pitch accent identified 
as LH*. If no pitch difference >7 Hz was found, the word was counted as unaccented. 
Table 6 gives the counts for accented vs. unaccented words as just described for all 
speakers, sorted according to word type (content/function). The two right columns 
give accentuation ratios, as words per pitch accent in the penultimate column, and 
as percentage of accented words among all considered words in the final column. 


Table 6: Pitch accentuation counts in seven Spanish Conc corpora, sorted according to word type. 


Word type Accented Unaccented All ratio words/accent % accented words 
Content 391 46 437 1.12 89.5 
Function 66 38 104 1.58 63.5 


All 457 84 541 1.18 84.5 


The first result presented in this table is that on average, nearly 9096 of all content 
words are pitch accented, or that an accent occurs once every 1.12 content words. This 
is in broad agreement with previous findings about Spanish accentuation which state 


99 This threshold was chosen for comparability to the results in O'Rourke (2005: 62, note 10; 76, 
note 6,7) where it is used as a compromise between the results regarding psychoacoustic measure- 
ments of just noticeable differences in f0 in Klatt (1973) and Pierrehumbert (1979). The discussion in 
Fastl & Zwicker (2007: 182-188) suggests that it might be a slightly high threshold for the range of 
75—700 Hz, the f0 range of the majority of human speech. Their data is however based on percep- 
tion experiments with artificial sounds, not actual conversations with normal levels of background 
noise. The same 7 Hz threshold is also chosen in Rao (2009). 
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that nearly every content word is pitch accented.!? The overall accentuation percent- 
age of nearly 85% of all words considered is comparable to the 77% given in Rao (2009: 
15) for spontaneous speech in Barcelona Spanish. The table also attests to a pronounced 
difference in accentuation between content and function words, with function words 
more than 25% less frequently accented than content words. A X?-test was done on the 
cells counting accentuation for all speakers (shaded in grey in the table), with its result 
suggesting that the difference in word type is indeed highly significantly associated 
with the observed difference in accentuation (Pearson’s X? (1) = 43.338, p < 0.001). That 
function words are less frequently accented than content words is also a result broadly 
in keeping with the literature on Spanish prosody (cf. Hualde 2009; Rao 2009). 


Table 7: Pitch accentuation counts in seven Spanish Conc corpora, sorted according to whether 
words occurred in isolation or as part of phrases containing several words and word type. 


Multi-word phrases 


words per Accented Unacc. (of which All ratio words/ % accented 
phrase words function words) accent words 
4-word-phrase 18 10 (5) 28 1.55 64.3 
3-word-phrase 68 13 (6) 81 1.19 84 
2-word-phrase 196 32 (16) 228 1.16 86 
All words in multi-word phrases together 
Word type Accented Unaccented All ratio words/ % accented 
accent words 
Content 236 28 264 1.12 89.4 
Function 46 27 73 1.59 63.0 
All 282 55 337 1.20 83.7 
Single-word phrases 
Word type Accented Unaccented All ratio words/ % accented 
accent words 
Content 155 18 173 1.12 89.6 
Function 20 11 31 1.55 64.5 


All 175 29 204 1.17 85.8 


Table 7 explores a further possible factor influencing accentuation, namely whether 
words are more likely to be accented depending on their realization in isolation or as 


100 The rate of 1.12 content words per pitch accent can be compared to the ratio of 1.27 content 
words per phonological phrase (defined as having a single identifiable pitch contour) found for 
Quechua in section 6.2.3.2 and based on a sample of similar size and type, consisting of all nominal 
sequences from the seven Conc games in Quechua by the same speakers considered here, plus one 
Maptask and one Cuento corpus. 
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part of a phrase containing several (>1) words. Torreira et al. (2014) state that words 
in phrase-medial position in Madrid Spanish are more likely to lack pitch accentua- 
tion than phrase-peripheral words. This would mean that on average, words in mul- 
ti-word phrases should be less frequently accented than words realized alone, which 
are always peripheral. Words were counted as being together in a phrase if no dis- 
cernible disjuncture was perceived amongst them, e.g. a hesitation, short break, or 
intonational boundary movement, corresponding roughly to break index level 2 or 3 
of Sp_ToBI (cf. Aguilar et al. 2009). Monosyllabic functional elements such as articles or 
prepositional clitics were again disregarded, i.e. a multi-word phrase contains at least 
two words that are either content words or polysyllabic function words. The results in 
Table 7 indicate that the ratio of accentuation is not different overall between words 
in multi-word phrases as opposed to words produced in isolation or only together 
with clitics. Only for 4-word phrases does there seem to be a stronger tendency for 
words to be unaccented, but this result is statistically not fully conclusive and would 
need more data from words in 4-word phrases to be corroborated.’™ The results of a 
X*-test on whether being in a multi-word phrase (without internal differentiation) vs 
being in a single-word phrase is associated with a difference in accentuation turned 
out not to be significant (Pearson’s X?(1)= 0.43, p=0.51), as could be expected from the 
very similar accentuation ratios in the table. Overall, this suggests that if an effect of 
being phrase-medial exists for the likelihood of a word to be accented, it seems too 
weak to emerge outside of four-word phrases or at all in this sample. 

In answer to the question about the frequency of LH* as pitch accent and what 
other variants were encountered (see Table 8), 126 or 23.3% of all words were iden- 
tified as not having an LH* pitch accent. Those include all words counted as unac- 
cented. Of the 457 words counted as accented (with an identifiable peak), 42 (9.296 of 
accented words) were identified as having a different pitch accent than LH*. Of those, 
26 are cases where the word in question occurred directly in phrase-initial position 
with the initial syllablle also being the tonic and where no rise leading up to the peak 
was found, so that the pitch accent was classed as H* instead of LH* (cf. A32 Conc. 
ES 0298, the right image in Figure 24). With these cases it is probably the prosodic 
context that is responsible for this realization, i.e. the lack of pretonic material on 
which the rise can be realized leads to an increased likelihood of rise truncation. 
The context did not categorically lead to this realization, as OA32 Conc ES 1549 (left 


101 A X?-test was done to investigate association between accentedness in phrases of the four dif- 
ferent lengths, yielding a significant result (Pearson's X*(3)= 9.37, p = 0.02), but expected counts in 
the cell of unaccented words in 4-word phrases were below 5, rendering the test somewhat unreli- 
able (cf. Field et al. 2012: 818), especially since this was also the only cell in which the standardized 
residual was greater than |1.96|, suggesting it mainly contributed to the test being significant. A 
Fisher's exact test (two-sided, p-0.04) was just about significant. 
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Table 8: Words not counted as having an LH* pitch accent in seven Spanish Conc corpora. 


Accented Unaccented All (percent 
(percentof (percentof of all words) 
all words) all words) 


Words not counted as having an LH* pitch accent 42 (7.8) 84 (15.5) 126 (23.3) 
- Of which H* without leading rise on words in 26 (4.8) 0 (0) 26 (4.8) 
phrase-initial position with initial tonic 
- Of which H tone part of preceding plateau 14 (2.6) 24 (4.4) 38 (7) 
(H*L/HL*) 
- Of which only H% phrasal boundary 1(0.2) 11(2) 12 (2.2) 
- Of which L*L96 0 (0) 2 (0.4) 2 (0.4) 
- Of which others 1(0.2) 47 (8.7) 48 (8.9) 


image in Figure 24) with a more fully pronounced rise shows, but conversely, the trun- 
cated realization was only found in this context. Since the discourse context for the 
more LH*-like realizations and the H*-like realizations is the same, I assume that the 
realization without a rise is here only a prosodically conditioned truncated variant 
of the LH* pitch accent. Consequently, counting them together, a remaining 16 of 457 
accented words (3.590) were identified as having a different pitch accent than LH*. 
In particular, words in phrase-final position also realized the LH* pitch accent most 
frequently, and what could be identified as the nuclear configuration L* L%, familiar 
from the literature on many other Spanish varieties, was only found twice. Equally, 
the delayed rising accent L*«H*, frequently attested on prenuclear words in peninsu- 
lar Spanish varieties, was only identified once here, and only tentatively. In the vast 
majority here, peaks of rising pitch accents (those identified as LH*) were thus found 
to be realized within the tonic syllable (a result also confirmed on another dataset 
in section 6.1.6), in agreement with what O'Rourke (2005) reports for Cuzco Spanish. 


Mp 
" 


Figure 24: OA32 Conc ES 1549'?? (left) and 0298" (right) (águila ‘eagle’). 
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In the other cases where tonal movement on the tonic could not be identified as 
realizing an LH* pitch accent (in words both with and without peak), apart from 
simply flat realizations that make up a majority of the words counted as unaccented, 
two particular phenomena were observed. On the one hand, the rise expected on 
the tonic sometimes occurred only posttonically, in phrase-final words (12 cases). 
Instead of classing them as pitch accents with a delayed peak (L+<H* or L*H), I 
suggest that these cases look most likely to exhibit only boundary movement, i.e. H- 
or H%, but not a pitch accent. This is because they only occurred phrase-finally and 
because in proparoxytones, not only the tonic, but also the subsequent posttonic 
syllable was found to be low, with the rise taking place only on the (phrase-)final 
syllable, as in XQ33_Conc_ES_0966 (Figure 25). Because the tonic was low in most of 
these, they were counted as unaccented (11 of 12). These cases are similar to what 
is described as the “phrase accentuation" variant in section 5.1.3.1. 
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Figure 25: XQ33 Conc ES 0966'^* (debajo del águila ‘underneath the eagle’). 


On the other hand, in some cases the elevated pitch expected on the tonic was found 
already on preceding syllables. The tonic was then either also realized with high 
pitch, followed by a fall on or into the posttonic, or the fall already took place on 
the tonic itself, or it was even realized largely low after the fall from the pretonic. 
I’m providing several examples (in Figures 26-30) to demonstrate the varia- 
bility of the observed phenomenon. Perceptually, some of these sound like stress 
has shifted towards the pretonic, or even towards the preceding indefinite article 
in some cases. What they have in common is the lack of low targets on the pre- 
tonic, substituted instead with a high plateau-like realization, and that the tonic, 
instead of being a location for a pitch peak, seems to instead serve as a landmark on 


104 https://osf.io/4685c/ 
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Figure 27: HA30 Conc ES 033696 (con su dinero ‘with her money’). 


which this high plateau ends. In those examples that include an indefinite article 
(Figures 26, 28, 29), it could be argued that this is merely a result of undershoot of 
the low tonal targets after a pitch accent on the indefinite article, in combination 
with perhaps an H*L or even HL* pitch accent on the tonic of the content word, 
but such an analysis, apart from leaving unexplained why an H*L or HL* pitch 
accent should occur in the same discourse contexts in which LH* otherwise occurs, 
is clearly insufficient for AZ23 Conc ES 1851 (Figure 30). There, two pretonic syl- 
lables precede the tonic on the content word millonario *millionaire" and form a 
plateau that is as high as the highest value on the tonic, on which the beginning of 
the fall occurs. Because of examples like this one (cf. also Figure 27), instead of an 


105 https://osf.io/r73wd/ 
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Figure 28: OZ14 Conc ES 2242"? (una casa ‘a house’). 
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Figure 29: QZ13 Conc ES 1567! (una montaña ‘a mountain’). 


analysis e.g. in terms of stress shift, I would like to suggest that at least some of these 
cases are best understood as reflecting a prosodic configuration also encountered 
in the Quechua data (cf. section 6.1.2), where plateau-like realizations are common 
that extend from the initial boundary of a word to either the penult or the end of 
the phrase, and in which the tonic syllable serves at most for anchoring a tonal 
transition, here from H to L. In the Spanish Conc data discussed here, this type of 
realization was found on 38 of all 541 words (796). Depending on where the fall 
took place and whether the tonic was still higher than the posttonic, representable 
broadly in ToBI as the difference between H*L and HL*, these cases were counted 


107 https://osf.io/bhf3e/ 
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Figure 30: AZ23 Conc ES 1851'? (está el millonario '(there) is the millionaire’). 


as either accented (14 of 38) or unaccented (24 of 38). Table 8 sums up the counts of 
the categories discussed in this section. 

The variable tendency for creating plateau-like realizations was also observed 
in phrase-final words with pitch ending high, i.e. presumably delimited by a high 
boundary tone. In a number of such cases, pitch was suspended at the same high 
level it had reached on the pitch accent (LH*) realized on the tonic until the end of 
the word. The resulting high plateau-like realization also extends to the intervening 
syllable in proparoxytones, as examples like LC34 Conc ES 1298 (above in Figure 
31) show. For comparison, consider the examples in Figure 32 without such a pla- 
teau-like realization. 

Here, pitch drops again or stays level after the pitch accent on the tonic (even 
in the paroxytonic naranja), before then realizing a separate phrase-final rise 
that is also usually scaled higher than the pitch accent. The variant with the pla- 
teau-like realization is preferred in the seven Conc corpora by the speakers QZ13, 
OZ14, AZ23, 7724, XU31, OA32, and XQ33, who use it almost exclusively, while TP03, 
KP04, SG15, QF16, ZR29, HA30, and LC34 mostly prefer the realization with a sep- 
arate rise (in LC34’s case, the example given in Figure 31 is the only time he uses 
the plateau-like realization in Conc). No difference in discourse contexts could be 
made out that would differentiate the two variants functionally, they all seem to 
be cases of continuation rises (in most cases) or occasionally uncertainty. The pla- 
teau-like variant is quite similar to the realization with sustained pitch that Gabriel 
et al. (2011) describe as one possibility for realizing intermediate phrases with high 
boundary tones (H-) in Portefio Spanish. The one with a separate rise is comparable 
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Figure 31: LC34 Conc ES 1298 (del murciélago está el ‘of the bat is the’, top) and ZZ24_Conc_ 
ES 0994" (en el primero ‘in the first’, bottom). 


to what they describe as a realization with a *continuation rise", except that here, 
as we have seen, there isn't a “continuous f0 rise from the last stressed syllable 
until the break" (Gabriel et al. 2011: 163), but instead, f0 first either drops slightly 
or stays level after the stressed syllable to then produce a separate rise. The realiza- 
tions where pitch drops slightly more resemble their realization with a *complex 
boundary tone", which is described as exhibiting *a small dip located between the 
pre-boundary pitch accent and the high F0 peak signaling the boundary" (Gabriel 
et al. 2011: 167). They propose that the dip is the effect of an additional L tone 
that forms a complex ip-level boundary tone together with the H-, but also con- 
sider the possibility that it is simply an effect of interpolation. I would tend to the 
latter interpretation, because otherwise two different boundary tone combinations 


110 https://osf.io/cke6p/ 
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Figure 32: HA30 Conc ES 0245 
(murciélago ‘bat’, bottom). 


(una naranja ‘an orange’, top) and QF16_Conc_ES_0315 


would be posited without differentiating them contextually. Interestingly, Gabriel 
et al. (2011: 178) relate the relatively high frequency of cases of sustained pitch in 
their Portefio data (32%) to the influence of speakers of Italian on the speech of 
Buenos Aires, because Italian was found to have a far higher rate of sustained pitch 
(45.5%) compared to continuation rises (54.5%) than Peninsular Spanish (11.2% 
and 88.4%, respectively) in Frota et al. (2007). In our case, it should be noted that 
in the Quechua rising contour identified in data by the same speakers (cf. section 
6.1.1), pitch regularly forms high plateau-like realizations, and the multiple align- 
ment of the H tone responsible for these realizations is an important variable factor 
in the OT analysis of the Quechua data (cf. section 6.3). 
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5.1.2 Interrogatives 


This work is mainly concerned with declaratives. However, before picking up the 
the discussion of variant intonational realizations of them, I will provide a descrip- 
tion of some of the intonational variation in the Huari Spanish data that is due 
to a difference in utterance type, in particular a description of interrogatives. It 
has been claimed that interrogatives display a larger array of both prosodic and 
pragmatic variability than declaratives (Cangemi & Grice 2016: 11-12). While the 
discussion will showcase some of these fine-grained pragmatic differences, we will 
encounter a comparative absence of formal differentiation. 


5.1.2.1 (Neutral) polar questions 

Neutral polar questions, in which one speaker requests information from another 
without a bias for either answer, are in the majority produced with final rises in our 
Spanish data.!* At first sight, this makes them in principle very similar to declar- 
atives with continuation rises. At least for some speakers, however, they are for- 
mally clearly differentiated: while in declaratives with continuation rise, the last 
accented paroxytonic word in the phrase before the final rise realizes an LH* pitch 
accent, in polar questions it does not form a peak but instead, a valley or trough is 
realized on it which extends until the final rise on the phrase-final syllable. This 
can be illustrated by the near-minimal pair of XU31 MT ES 0871 (Figure 33) and 
OA32 MT ES 0890 (Figure 34). Both are formed on the segmental material debajo 
del árbol. Their context!” is given in (41): it shows that the first (at 87.1) is an unbi- 
ased request for information by the speaker who does not have the path on the 


114 For some small quantification: of the 43 neutral polar questions in Spanish Quién, the corpus 
with the highest relative occurrence of questions, 36 have a final rise. Of the 7 that don't (and are 
basically flat throughout the utterance), 4 are produced by a single speaker. Three of the rising 
polar questions do not have a final rise but rise throughout the utterance without marking promi- 
nent positions, but they are also fairly short utterances. 

115 The numbers in the leftmost column give the time in seconds at the beginning of the utterance 
in the same line. Consecutive turns by the same speaker are given separate lines and times when 
there are sufficiently long pauses (silences) between them. Overlap between speakers is indicated 
using square brackets, [ for where it begins and ] for where it ends (adopting a convention from GAT 
2, cf. Selting et al. 2009). If only ] occurs in a line, it means the utterance begins with overlap; if only [ 
occurs, the overlap continues until the end of the utterance. Disfluencies (false starts, self-repairs and 
the like) are marked by a dash after the transcription of a word up to the point of interruption: pas-, 
quier-, y-, etc. Transcription is based on standard orthography for Spanish. Where this is not the case 
it indicates a particular pronunciation or lexical item and is explained in a footnote. The transcription 
of interjections and hesitation tokens has also been conventionalized: ah is an update token, m/un the 
nasalized version of the confirmation/update token ajá, uh a filled pause, uhm its nasalized version. 
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map in the maptask, taking the form of a polar interrogative. The other speaker 
understands this to be a request (seen from his uttering the acknowledgment token 
ya) and then produces the same material as a first part of further instructions for 
how to proceed (at 89.5), indicating turn maintenance and that the instructions 
are not complete via the final continuation rise. At least for the speakers that do 
make this difference, the analysis for the nuclear configuration in non-biased polar 
questions therefore could be L* H% (but see below), while for the declaratives with 
continuation rise it is LH* H%. Note also the pitch accent on the prefinal debajo in 
both examples, which in both cases is identifiable as LH*. This is the case also in 
other polar questions with pre-final accented words, suggesting that prenuclear 
pitch accents are of the form LH* both in polar questions and declaratives. 


nos 


" MUR s 


Mlle e € 
"n in, 


1] 


Ti 


150. 


Figure 33: XU31 MT ES 0871''5 (debajo de árbol ‘below the tree’; neutral polar question with final 
paroxytone). 


Figure 34: OA32 MT ES 0890'" (ya debajo de árbol ‘yeah below the tree’; declarative with 
continuation rise, final paroxytone). 
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(41) XU31 OA32 MT ES 0831-0969'* (context for XU31_MT_ES_0871 and OA32_ 


MT ES 0890) 

time OA32 (the one with the path) XU31 (the one without the path) 
83.1 has encontrado un ovejita no 

87.1 del- debajo de árbol 

89.0 ya 


89.5 debajo de árbol 
910 y da la vuelta 


A similar yet more subtle contrast can also be found with oxytones in final position. 
Examples for polar questions with final oxytones are fairly scarce (only 10 in the 
entire corpus studied here), but they do seem to support the following generalized 
description: in them, the low stretch before the final stressed syllable extends into 
it, forming a pitch elbow clearly within it, before then finally rising (cf. Figures 35 
and 36). In continuation rises with a final oxytone, on the other hand, the same 
pitch elbow is formed before the final stressed syllable or at its very beginning (like 
in declaratives in general), beginning the rise with it or even before reaching it (cf. 
Figures 37 and 38). This is consistent with an analysis in which the elbows are low 
targets for an L tone. In polar questions, we could then assume that this low tone is 
associated and aligned with the stressed syllable, followed by a high boundary tone 
(L* H%), whereas in continuation rises it is the general declarative pitch accent LH* 
followed by the boundary tone (LH* H%). However, a slightly different analysis sug- 
gests itself when we also take polar questions with question tags into consideration. 
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Figure 35: MD40 MT ES 0324"? (por su encima también ‘also above it’; neutral polar question with 
final oxytone). 
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Figure 36: A723 Quien ES 0747? (en esta reunión está ‘is s/he in this meeting’; neutral polar 
question with final oxytone). 


Fut 


ne 


Figure 37: Q713 MT ES 0446"? (y encima del ricachón ‘and above the rich guy’; declarative with 
continuation rise, final oxytone). 


Polar questions with the question tag no such as José viene mañana, no? are usually 
taken to convey a confirmation bias, i.e. the speaker asks for a truth value on the 
proposition expressed by the question but with the expectation that this truth value 
will be of the same polarity as the proposition itself (Farkas & Bruce 2010). Confir- 
mation-seeking polar questions (checks) have been found to have a prosodic form 
different from that employed for neutral polar questions in several Romance lan- 
guages, including Bari Italian (Grice & Savino 1997; Grice & Savino 2004), Majorcan 
Catalan (Vanrell et al. 2013), Puerto Rican Spanish (Armstrong 2010), and somewhat 
tentatively, Madrid and Mexico D.F. Spanish (Hualde & Prieto 2015: 377). Thus we 
have to be cautious with using findings from polar questions with question tags 
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122 


Figure 38: TP03_MT_ES_4425 ““ (por la boca de la olla nomás ‘just by the mouth of the pot’; 
declarative with continuation rise, final oxytone). 


in order to argue a point about neutral polar questions. However, as we will see 
below (section 5.1.2.2), there is evidence for a particular intonational contour used 
for confirmation-seeking questions (a ‘circumflex’ or rising-falling contour) that is 
different from that of neutral polar questions in our data. This confirmation-seek- 
ing question contour however never occurs on the polar questions with tags, 
which instead seem to use the same contour as neutral polar questions (suggesting 
a kind of workshare relationship between the tag and the confirmation-seeking 
intonation). With this in mind, let us consider tag questions and what might be 
learned from them for the analysis of neutral polar questions in general. In such 
tag questions with no where the final word is a paroxytone, a peak forms on the 
final accented word, preceded by a low stretch with an elbow in a previous sylla- 
ble, just as in declaratives; then after the peak, pitch returns to a low stretch on the 
posttonic syllable, apparently forming a low target, before then realizing the famil- 
iar final rise on the tag. See Figure 39 where the second valley, after the peak on 
the final stressed syllable, reaches even lower than preceding low stretches in the 
utterances, making it unlikely that this is due just to “sagging” interpolation. With 
a final oxytone, the same picture emerges: see Figure 40, where while the posttonic 
valley is not as deep as in Figure 39, the tag itself has a considerably longer duration 
than it has there. 

The posttonic valley in the tag questions is straightforwardly analyzed as a low 
target indicating the presence of a low tone before the final H boundary tone, so 
that a final bitonal LH% is most plausible. In turn, this leads to two possible alterna- 
tive analyses for neutral polar questions: in the first, neutral polar questions differ 
from both polar tag questions and declaratives with continuation rise in the iden- 
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Figure 39: SO39_MT_ES_3237"” (llegas a una escoba, no ‘you get to a broom, right’; polar question 
with question tag after final paroxytone). 
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Figure 40: Z724 Quien ES 0788 “ (no puedo agregar, no ‘I can't add (to that), right’; polar question 
with question tag after final oxytone). 


Table 9: Comparative analysis (alternatives I and IT) between neutral polar questions, declaratives 
with continuation rise and tag questions. 


Utterance type Prefinal (prenuclear) Final (nuclear) — Boundary 
accent accent tone(s) 

declarative with cont. rise LH* LH* H% 

tag question LH* LH* LH% 

neutral polar question (Alternative I) LH* L* H% 

neutral polar question (Alternative II) LH* LH* LH% 

123 https://osf.io/3v8wy/ 
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tity of their final (nuclear) accent, and tag questions differ from the other two via 
their boundary tone(s). This analysis is given in Table 9 as alternative I. Tag ques- 
tions are then intonationally different from both neutral polar questions and from 
biased questions without tag (for those, see below). 

Another possible analysis is that tag questions are intonationally the same as 
neutral polar questions, but that both are different from biased polar questions 
without tag. This requires the assumption that also in neutral polar questions 
without tag, the final accent is LH* and the final boundary tone LH%, but that when 
these tones come into conflict with one another due to time pressure (as with final 
oxytones and often also paroxytones), the boundary L wins out in the realization 
against the pitch accent H (which is truncated or severely compressed), precisely 
because in this way they are still differentiated from declaratives with continuation 
rise. A rather welcome consequence from the point of view of having a tidy system 
would be that in this way, phonologically both prefinal and final pitch accents are 
the same (LH*) across the board, i.e. in declaratives and all types of polar questions, 
while differences between them are implemented via boundary tones. This second 
analysis is summarized in Table 9 as alternative II. Neutral polar questions with a 
final proparoxytone would be an excellent testing ground for deciding between 
these two competing analyses: alternative I predicts a simple rise from the final 
stressed syllable to the end of the utterance across the intervening syllables, but 
under alternative II, we could expect to see the same rise-fall-rise pattern as in tag 
questions, fully realizing each tone of the LH* LH% configuration, since time pres- 
sure conditions are more relaxed. 
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Figure 41: LC34 Quien ES 1465'?^ (es nuestra compañera ‘is it our classmate’; neutral polar question 
with final paroxytone). 
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Unfortunately no neutral polar questions with final proparoxytone have been 
found in the corpora used here. In their absence, we can still consider a polar ques- 
tion utterancelike LC34 Quien ES 1465, given in Figure 41: it has a final paroxytone 
where a first low elbow is formed in the pretonic at the end of the first syllable, then 
a slow rise follows that forms a peak or very short plateau at the end of the stressed 
syllable, and then a much steeper rise concludes the pitch movement towards the 
end of the utterance. Under alternative I, the first rise and the change in rise speed 
have no real explanation, since it predicts a low target in the final stressed syllable 
which would simply be a continuation of the low stretch that began after the last 
prefinal accent and which would therefore be at the same level as the first elbow in 
the final word. Under alternative II, the first rise is the realization of the LH* pitch 
accent on the final word, and the short plateau could be explained as the result of 
the L tone of the LH% boundary tones being undershot due to it competing with the 
H”, and once this competition is over, the final H% tone is the only factor affecting 
pitch level, going some way towards explaining the difference in rise times. Thus, 
this example (and others like it) seem to favour alternative II, but the evidence is 
somewhat inconclusive. 


5.1.2.2 Polar questions biased towards confirmation (in the maptasks) 

As mentioned above already, there are several examples of utterances in the data 
studied here that can be analysed as polar questions with a confirmation bias 
(checks) and have a distinguishing prosodic form. This form roughly corresponds 
to what has been called the *circumflex contour" for polar questions in the descrip- 
tion of other varieties of Spanish (Hualde & Prieto 2015). In particular, there are 
frequent sequences in the maptask corpora where a certain type of these questions 
regularly occur. They can be schematized as follows (step b in bold being the con- 
firmation-seeking question): 


(42) Schematic sequence in the maptasks in which a type of check occurs 
a. Speaker with map: you move in relation y to landmark x 
b. Speaker without map: relation y to landmark x ? 
c. Speaker with map: [Confirms] 
d. Speaker with map: [Proceeds to next instruction] 


These should be considered as questions with confirmation bias because 1) the 
epistemic imbalance between the speakers that is inherent in the game (one 
speaker has the path on their map, the other doesn't have it) effects a general bias 
for the speaker without the map to request information and the other to provide 
it, especially with regards to how to follow the path. 2) The information has previ- 
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ously been given (in step a), so it is not likely for the speaker without the map to 
intend the utterance produced in step b to be a completely neutral question.’”° 3) 
the speaker with the map crucially understands the utterance in step b to be some 
kind of request, as evidenced by them regularly giving a token of confirmation (step 
c), and only then (step d) moving on to giving further instructions. If step b were 
just a confirmation or acknowledgment itself (like “message received, I repeat: 
relation y to landmark x'?7”), then the speaker in the path could proceed to giving 
further instructions already at step c, omitting their own confirmation at that step. 
Sequences of that latter type also abound in the corpus, with the speaker without 
the map just uttering a confirmation or acknowledgment token (such as sí, ya, hm, 
or even y) at b and the other then moving on to the next instruction at step c, and 
they can thus be clearly differentiated from the cases discussed here. 

Utterances performing such illocutionary functions have been called clarifi- 
cation requests (Ginzburg 2012; Lupkowski & Ginzburg 2016), and specifically in 
the context of map tasks, very similar utterances have been proposed to be called 
OBJECT in Grice & Savino (1997). Specifically, they contrast OBJECT moves with 
ACKNOWLEDGE moves, which simply acknowledge or confirm instructions given. 
On OBJECT moves, they state that they are “used to point out that there has been a 
break-down in communication, such that the game cannot continue until common 
ground is re-established” (Grice & Savino 1997: 30). It should be noted that Grice & 
Savino (1997), while agreeing with the interrogative nature of these OBJECT moves, 
explicitly separate them from what they classify as CHECKS, although the criteria 
are not quite clear: for OBJECT moves they state that they are what has been clas- 
sified as ‘echo questions’ in the literature because they repeat what has been said 


126 The move in step b might conceivably be performed because the information conveyed by it is 
incompatible with what the speaker without the map expects or believes to be in CG, i.e. as a kind of 
incredulity question. For the utterances discussed in this section, the context does not suggest this 
to be the case. Utterances where the context is compatible with a similar import of incredulity are 
briefly discussed in section 5.1.3.3 with the tentative conclusion that they have the same contour 
shape as the confirmation-seeking questions discussed here, but with the pitch peak aligned later. 

127 This kind of utterance seems familiar from (narrative portrayals of) military-style dialogues. 
However, those are subject to normative control in following a conversational procedure that is 
designed to minimize risk of misunderstanding (especially in the context of less than ideal chan- 
nels of communication, such as giving and receiving instructions over radio accompanied by the 
extreme background noise of armed altercations or heavy machinery) at the expense of increasing 
redundancy. In normal conversation, I argue that the maxim of quantity will discourage such ver- 
batim repetitions of instructions for the sole purpose of confirmation and make them stand out 
(marked) as probably intended to mean more (e.g. being a request for confirmation; cf. also Farkas 
& Bruce 2010: 99 for the view that assertion confirmation is often implicit because it is the least 
marked response). 
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before by the interlocutor (Grice & Savino 1997: 30), while CHECKS are defined as 
confirmation-seeking questions asking for information “which the speaker believes 
has already been conveyed” (Grice & Savino 1997: 29). This is not a satisfying sep- 
aration, since one definition is concerned with the form and the other with the 
meaning of the utterances (and both definitions could therefore be true about a 
token utterance of either type). Furthermore, OBJECT moves are said to be not 
“simple questions”, since “they could be responding within one game as well as 
initiating another (sub-) game" (Grice & Savino 1997: 30-31). It is not clear how this 
objection does not equally apply to those utterances classified by them as CHECKS. 

It is clear that not all types of biased questions either fulfill exactly the same 
function or have the same form. They also often make it difficult to maintain a 
categorical separation between declaratives and interrogatives (but not between 
assertions and questions); but as argued above, the utterances encountered in the 
maptasks do seem to fulfill the criteria of being questions and of being biased in 
the sense that they ask for confirmation about information which the speaker has 
already received. In our corpus, the same intonational form is also used in utter- 
ances ofthe type occurring at step b even if there are other utterances intervening, 
i.e. if step b is not exactly a repetition of the previous utterance by the interlocutor, 
which speaks against an analysis as echo questions. (43) is an example. 


(43) SG15 QF16 MT ES 0453-0703"* 


time SG15 (with the path) QF16 (without the path) 
45.3 ya de ahí pasamos = step a (part 1) 
so from there we move 
on 
46.7 por 
47.5 de abajo del 
below the 
49.9 cómo se llama = step a (part 2), 
what is it called solving the 
52.1 mhm sub-QUD How 
56.6 su nombre se me ha ido IS X CALLED IN 
I've lost the name RELATION Y TO 
59.7 corderito LANDMARK X? 
little lamb 
60.4  corderito 
little lamb 


128 https://osf.io/jfk4a/ 


61.5 


62.0 


65.0 


66.5 


69.4 


debajo 
below 

mhm ya de abajo 

yeah below 

de ahí pasamos 

from there we pass 

por lado de los 

alongside of the 

gentes 

people 
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= step b 
= step c 


= step d 


In the sequence, SG15 starts producing the instruction corresponding to step a, but 
only gets to specify the relation y, before a small digression occurs in which the two 
of them solve the sub-QUD of how the landmark x is called. At 59.7 QF16 suggests 
corderito; note that this takes the form of a polar question with a final rise. SG15 
confirms and then QF16 performs step b, but he does not produce an echo-question 
repeating the previous utterance. Instead, his question there is related only to the 
relation y that had been asserted before the digression occurred. This debajo in 
step b does not have the form of a neutral polar question with final rise, but the 
*circumflex contour" (cf. Figure 42). Yet, crucially, it is understood as a request for 
confirmation by SG15 in step c, when she gives that confirmation, and then pro- 
ceeds to give new instructions in step d. 


li 
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Figure 42: QF16 MT ES 0615'? (confirmation-seeking polar question). Cf. context (43). 


129 https://osf.io/gby2a/ 
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In another sequence, a confirmation-seeking question contrasts with a continua- 
tion rise where the two are intonational minimal pairs: 


(44) QZ13_0Z14_MT_ES_0769-0971"*° 
time QZ13 (with the path) 07414 (without Intonational form 
the path) 
76.9 por el encima del difunto das 
una vuelta debajo del zorro 
above the deceased you turn 
around below the fox 


83.3 debajo del zorro | *circumflex 
below the fox contour" 
86.1 estás neutral polar 
are you there question 
92.5 debajo del zorro | continuation rise 
below the fox 
94.0 encima del nube que tiene 
truenos 
above the cloud with the 
thunder 
96.7 ya 


right 


Here, QZ13 gives instructions as step a at 76.9, followed by OZ14's first debajo del 
Zorro at 83.3, which is here step b and has a circumflex contour. This is understood 
by QZ13 at least to be a request for suspension of further instructions because he 
waits for a while, then asks (estás at 86.1) whether OZ14 has now reached the point 
on the map at which debajo del zorro is a reasonable instruction (quite literally 
whether they have reached common ground on their maps!?!). At 92.5, 0714 then 


130 https://osf.io/rd3vm/ 

131 The commonality here with a sequence of a request for confirmation and then giving this con- 
firmation might be explained thus: as Grice & Savino (1997: 30) point out, such moves asking for 
confirmation on already given information indicate some kind of break in the game which needs to 
be adressed until common ground is reestablished. Instead of treating OZ14's utterance at 83.3 as a 
request for confirmation (he doesn't give any), QZ13 seems to interpret it more temporal-spatially 
in terms of the game: OZ14 has not drawn the line on his map up to the point from where he can 
go debajo del zorro, and at which point the two of them have reached the same spot in their joint 
progress across the map (which is a very relevant part of their common ground for this game). A re- 
quest for confirmation is partially a request for suspension of epistemic progression, here applied 
as a suspension of spatial progression. 
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repeats his previous utterance, but with the intonation of a continuation rise, and 
this is understood by QZ13 immediately as him being ready to continue, and cer- 
tainly not as a request for further confirmation; he proceeds with the next instruc- 
tions and OZ14 acknowledges them without a problem (94.0 and 96.7). 

Regarding their form, these confirmation-seeking questions are very similar 
to what has been called *circumflex contours" in the description of other varieties 
of Spanish: on the last accented word, a peak is formed within the stressed sylla- 
ble, with an elbow at the start of its rise in the pretonic and another elbow at the 
end of its rise in the posttonic, and low pitch utterance-finally. For Madrid Spanish, 
Hualde & Prieto (2015: 374) conclude after a review of the literature that the final- 
rise question contour is used for *pragmatically unmarked" polar questions, while 
the circumflex contour seems to be used for some others, including echo- and 
confirmation-seeking questions. This assessment seems broadly applicable here 
as well, with the difference that here (unlike in Madrid Spanish), it is difficult to 
separate the confirmation-seeking questions from simple declaratives with a final 
fall formally, since those also are characterized by peaks on each accented syllable 
followed by a fall to low after the final one (see section 5.1.1). As such, they contrast 
formally and meaning-wise with neutral polar questions, as we have seen above in 
(43), and also with declaratives with continuation rise, as in example (44) above. 
That is to say, they are used in different contexts with different responses following 
them than neutral polar questions and continuation rises. However, it is not quite 
clear whether they also formally differ from declaratives with final fall in any way. 
Another example is given in Figure 43. This utterance also does not repeat anything 
from the interlocutor's previous utterance. Instead it expresses a conclusion that 
the speaker has reached from the preceding discussion (this is also indicated by the 
update token ah). It is understood by the other speaker to be a request for confir- 
mation insofar as he responds to it by giving this confirmation. Here the final pitch 
accent has more excursion than the preceding ones, and it would be an interesting 
task for the future so see ifthis difference in excursion really distinguishes this type 
of confirmation-seeking polar questions intonationally from declaratives. That is 
what is proposed for Madrid Spanish for distinguishing one type!?? of confirma- 


132 The situation for Madrid Spanish also seems less than fully clear: at one point, Hualde & Prieto 
(2015: 373) analyse an utterance described as a *marked confirmation yes/no question" with the 
upstepped nuclear configuration L*;H* L%. However, in their tabular summary (Hualde & Prieto 
2015: 389), they give the configuration L+H* HL% for confirmation-seeking questions (the contour 
is not discussed in the context of questions anywhere in the text itself) and L*;H* L% for echo ques- 
tions. In Estebas Vilaplana & Prieto (2010: 29), meanwhile, the nuclear contour H+L* L% is given for 
confirmation-seeking polar questions, which however in Hualde & Prieto (2015) is only discussed 
for questions in Puerto Rican varieties. 
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tion-seeking questions from declaratives with narrow focus (L;H* L% vs. LH* L%, 
cf. Hualde & Prieto 2015: 373, 389). However, for our data here it is difficult to assert 
this because in both confirmation-seeking questions and declaratives, local pitch 
span varies on the final pitch accent. Solving this issue would require a quantitative 
analysis on more controlled data and probably also perception tests. Pending this, 
we cannot make the claim that the types of biased questions discussed here differ 
formally from declaratives with a final fall. 


Figure 43: KP04 MT. ES 1975??? (a por ahí voy a subir ‘ah FII go up there’; confirmation-seeking polar 
question with final oxytone). 


The possibility seems to exist that with these kinds of utterances in the maptasks, the 
same intonational contour can be used for declaratives and for confirmation-seek- 
ing polar questions. Is it plausible that that is the case? Firstly, this contour is suf- 
ficiently different from that for continuation rises and that for neutral polar ques- 
tions. It is therefore not likely to be misunderstood to signal a situation where the 
speaker without the map gives an acknowledgement (either via a continuation rise 
contour or with just a token) or where they want to request information on some- 
thing that has not recently been discussed (which can be achieved using the neutral 
polar question intonation). Secondly, the epistemic situation between the speakers 
as given by the game makes assertions about where the path runs by the speaker 
without the path a relatively unlikely event (except for when that speaker describes 
what is on their map). Gunlogson (2001, 2008) discusses *declarative questions" in 
English!^* and develops a framework for understanding their pragmatics. Adapt- 


133 https://osfio/bu9tw/ 

134 In English, both word order and intonation are formal means involved in forming declaratives 
and interrogatives, and they sometimes seem to be at odds with one another: Gunlogson (2001, 
2008) therefore differentiates between polar questions (with subject-verb inversion and finally 
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ing from Gunlogson (2008: 113), the speaker without the path is not a source with 
regard to that information??? and can only make a dependent commitment to any 
proposition about where the path runs. A dependent commitment is made if one 
speaker is a source for a proposition and the other also commits to it without being 
a source for it (this is what happens when a polar question in a neutral context is 
answered: the speaker asking the question proposes to make a dependent commit- 
ment on a proposition, not knowing the answer but expecting their interlocutor to 
know, cf. Gunlogson 2008: 121). In Gunlogson (2008)'s analysis, the formal device of 
a declarative signals a context in which a commitment has been made with regards 
to the proposition at-issue: either by the speaker themself with the utterance (an 
assertion), or previously in the discourse. Whether an utterance consisting of such 
a declarative is then interpreted as an assertion or a question is dependent upon 
context and upon whether the speaker can be a source for the proposition: asser- 
tions can only be made by a speaker who can reasonably act as a source for the 
proposition they assert (Gunlogson 2008: 116—117). In our situation, the speaker 
without the path is highly unlikely to be a source, whereas the speaker with the 
path is known to be one, to both speakers. Thus, for the speaker without the path 
to make an assertion about it is odd in the context, first because they would pose 
as being a source for it even though it is known to both speakers that they really 
cannot be, and second because in any case, the speaker with the path is known by 
both to know about the path, so that they do not need to be informed about it (cf. 
Gunlogson 2008: 118—120). This sets the odds in favour of an utterance about the 
path by the speaker without the path to be interpreted as something other than an 
assertion. In fact, it is highly likely to be interpreted as a question given this contex- 
tual epistemic constellation, but as a question in a context in which a commitment 
to the proposition already exists, which is signaled by the declarative form: this is 


rising intonation), rising declaratives (without inversion, but with finally rising intonation) and 
falling declaratives (without inversion and finally falling intonation): 

a. Is Bob home? (polar question, the ? symbolizing rising intonation) 

b. Bobis home? (rising declarative) 

c. Bobis home. (falling declarative, the symbolizing falling intonation) 


Spanish does not normally differentiate polar interrogatives from declaratives by word order, but 
many of Gunlogsons’s insights about the pragmatics involved are still applicable here. 

135 A speaker is a source for a proposition if and only if they are commited to the proposition 
and if in the discourse context, their commitment does not depend on another speaker’s stated 
commitment to the proposition (cf. Gunlogson 2008: 113). In our case, the speaker without the path 
can only be informed about the path via testimony of the speaker with the path, so they cannot be 
a source for it (and this is known to both speakers in the context). 
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felicitous, since a commitment has in fact already been made by the speaker with 
the path (often, but not always, in the directly preceding utterance). Thus the into- 
national contour under discussion can remain a declarative contour, unspecified as 
to whether it realizes an assertion or a question, with the possibility of being either 
where specific contexts allow or suggest it. 


5.1.2.3 Alternative questions 

Alternative questions provide two or more alternatives, as in a list, from which the 
adressee is to form a true answer (cf. Sadock & Zwicky 1985: 179; Kónig & Siemund 
2007: 291—292). The answer set is therefore not just (p, ^p) as in polar questions, but 
may contain more elements. 


(45) ¿Quieres agua o café? 
‘Do you want water or coffee?’ 


Krifka (2011: 1749) points out that without intonation, questions like (45) are for- 
mally indistinguishable from polar questions with a disjunctive constituent, quieres 
(agua o café)?, and (45) is furthermore ambiguous (without intonation) between 
an open-set and a closed-set interpretation: it might be interpreted to mean that 
the two options are water and coffee, and none else, or that water and coffee are 
just two of the available options. In the data discussed here, there are always just 
two named alternatives, and very often their semantics together with the context 
suggest that they are also the only two possible ones. For example, in a situation 
when the path in a maptask leads up to a landmark from the right side, and the 
speaker without the map asks whether they should go above or below it, these are 
the only relevant alternatives in the discourse. In principle, it would of course also 
be possible to go alongside it or around it, but since the position of the path up to 
that point is fixed already, and the path is thus coming towards the landmark from 
a given side on a two-dimensional map, for the purposes of the game only two 
alternatives are relevant. 

Alternative questions are not overly frequent, even less so than polar and 
wh-questions (around 20 in all the corpora here taken together), and they are not 
evenly distributed across all speakers. Thus we cannot say whether the general- 
izations made here are valid for all speakers, but they do cover all of the exam- 
ples encountered, and not just the ones presented here. The generalizations are: 
whether the final word of the first alternative is proparoxytone (Figure 44), parox- 
ytone (Figure 45), or oxytone (Figure 46), a pitch accent that can once again uncon- 
troversially be analyzed as LH* is associated with its stressed syllable, realized as 
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a preceding low stretch and elbow before the stressed syllable and a peak within 
the stressed syllable. A high boundary tone is produced at the end of the phrase 
corresponding to the first alternative (with final oxytones, the peak of the pitch 
accent and that of the boundary tone are realized in one, just as in other utterance 
types). At the end of the second alternative, pitch is low, taken as evidence for a low 
boundary tone (L%). A further generalization (with the possible exception of KA36_ 
Quien_0197, given in Figure 47) is that in the phrase corresponding to the second 
alternative, no visible or audible pitch accents are realized on stressed syllables, 
whether they are final or pre-final. Pitch falls steeply from the high boundary at the 
end of the first alternative, and then either continues to fall slowly or remains very 
low. The phrase of the second alternative is thus deaccented or realized with an 
extremely reduced pitch range. Material preceding the first alternative is normally 
accented (cf. Figures 44 and 46). As already mentioned, Figure 47 shows the only 
utterance where the second alternative could be seen as still forming pitch accents. 
However, it should be noted that this utterance is produced in a particularly low 
register and with only a relatively small pitch range overall (as is typical for this 
speaker). This means that the sustained pitch in the first syllable of mujer might 
also just be the product of elevation due to consonantal microprosody caused by 
the fricative at the beginning of the final stressed syllable (cf. a similar elevation at 
the beginning of mujer in Figure 46, where however it is very small compared to the 
intonational pitch movements). 


Figure 44: KP04_MT_ES_3219"° (hacia qué lado voy a ir por encima del murciélago o por debajo del 
murciélago ‘which side am I going to go, above the bat or below the bat’; alternative question with 
two final proparoxytones). 


136 https://osf.io/qmtew/ 
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Figure 45: QF16 MT ES 0136?" (por encima o por abajo ‘above or below’; alternative question with 
two final paroxytones). 


Fo Ut) 


Figure 46: AZ23 Quien ES 0020'? (en la tarjeta tiene nombre de varón o de mujer ‘on the card does it 
have the name of a man or woman”; alternative question with two final oxytones). 


Deaccentuation or pitch compression will also be discussed in more detail in the 
context of declaratives in section 5.1.3.2. Its prevalence makes it necessary to 
include a reference to scaling in the intonational analysis of alternative questions 
here. From Figure 44, this reduced scaling can be seen to apply to a prosodic unit 
larger than an individual prosodic word, corresponding to the second alternative. 
The analysis is given in (46) in two versions, one with, one without, deaccentuation. 
The crossed-out notation is intended to convey the presence of accentable words 
that are deaccented or whose scaling is severely reduced, round brackets indicate 


137 https://osf.io/xhdu7/ 
138 https://osf.io/gt423/ 
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Figure 47: KA36 Quien ES 0197'?? (es uh hombre o mujer “is it uh a man or woman’; alternative 
question with a final paroxytone and a final oxytone). 


optional (pre-nuclear) pitch accents. In section 5.2 the nature of the prosodic units 
involved will be discussed in more detail. 


(46) Intonational analysis for alternative questions 
a. With deaccentuation 
[(LH*) LH* H-/96] atternative 1 KEEPS EH L%] atternative 2 
b. Without deaccentuation 
[(LH*) LH* H-/96] atternative 1 ((LH*) LH* L%]atternative 2 


5.1.2.4 Wh-questions 

The wh-questions attested in the corpora here present a heterogeneous picture 
intonationally. All of them form a pitch accent on the wh-word which is most likely 
to be analyzed once again as LH*. In all those where further accentable words 
precede the wh-word, they are also accented with an LH* pitch accent. Almost all 
wh-questions end on a final fall, but a small minority has a final rise. It is the part 
between the wh-word and the end of the utterance where the data is most ambig- 
uous. If the wh-word is the last word in the utterance, then there is simply a steep 
fall after the peak on its stressed syllable (cf. Figure 48). 

However, if more accentable words intervene, a variety of things can happen: 
firstly, after the wh-word, which has the highest pitch peak, pitch can fall progres- 
sively over the following accentable words, with or without compressed but identi- 
fiable pitch accents on each (cf. Figures 49 and 50). 


139 https://osf.io/cdrwf/ 
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Figure 48: QF16 MT ES 2110' (y ahí llego a dónde ‘and there I get to where’; wh-question with the 
wh-word finally and preceding accented words). 
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Figure 49: 0V37 Quien 0155" (en qué año murió ‘what year did s/he die’; wh-question with the 
wh-word followed by accentable words without identifiable pitch accents). 


In several examples, the accentable words following the wh-word do not behave 
uniformly: up to a certain word from the wh-word, pitch is maintained at a rel- 
atively high plateau which can be taken as evidence that they are accented, then 
there is a steeper fall than in the previous examples, after which pitch is at a rel- 
atively low level and all following words are not accented. Where this is the case, 
the steep fall between the accented and the unaccented stretch does not occur at 
a uniform syntactic boundary: it can occur between an auxiliary and a verb (cf. 
Figure 51), between a verb phrase and a postposed subject (cf. Figure 52), or directly 
after the pitch accent on the wh-word (cf. Figure 53). 


140 https://osf.io/39h2e/ 
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Figure 50: KA36_Quien_1075™ (cuándo va a cumplir su periodo ‘when is their term going to end’; 
wh-question with the wh-word followed by accentable words with pitch accents). 


PO gte) 


Figure 51: TP03_MT_ES_4108" (ya dónde te has quedado ‘so where did you get left’; wh-question with 
a post-wh-word fall between an auxiliary and a verb). 


That it is not always just one (the last) word which is deaccented/realized on the low 
pitch stretch (cf. Figure 53), makes it unlikely that the final pitch accent is L* while 
all preceding ones are LH*. The only possible generalization over all of these cases is 
one making use of optionality: at some point after the wh-word, deaccentuation (or 
severe compression) may occur. It seems reasonable to assume that deaccentuation 
happens after some kind of phrase boundary. However, the resulting phrases then 
do not correspond very well to syntactic or information-structural categories, since 
it is neither always just the wh-word, or the larger wh-NP, as in cuántos años (that 
could be said to be focused), nor something like the wh-NP + the following VP that 


142 https://osf.io/ht96p/ 
143 https://osf.io/detz6/ 
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Figure 52: OV37_Quien_0446™ (y cuántos años ha tenido el señor ‘and how old was the gentleman’; 
wh-question with the wh-word followed by accentable words with and without accents). 
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Figure 53: HA30 Quien ES 0571'^ (cuántos afios tiene ‘how old is s/he’; wh-question with two 
deaccented words after the wh-word). 


are separated as one from the following material, nor is the deaccented material 
itself homogeneous in this way. Nor is deaccentuation itself obligatory (cf. Figure 50). 


(47) Analysis for wh-questions with optional deaccentuation 
[LE wh-word (LH*)] ([ LH* p L% 


It is also questionable whether cases like Figure 49, where there is no steep fall after 
the wh-word but instead a slowly falling stretch that essentially seems to be interpo- 
lation, are phonologically different from the cases with steep falls. It could be that this 
really is just interpolation, then there would be no pitch accents on the accentable 


144 https://osf.io/pcn4b/ 
145 https://osf.io/8geu6/ 
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words after the wh-word and this is also a case of deaccentuation. It could, however, 
also be that this contour is the result of successive downstep and pitch compression 
on the pitch accents following the wh-word, then these cases would not be deaccented 
but just compressed. I will tend to the second explanation, since that allows us to 
unify deaccentuation as a phenomenon where the final L tone creates a low stretch 
aligned as far leftwards as possible, which then results in the typical steep falls fol- 
lowed by low stretches. However, the difference between compression and deaccen- 
tuation is clearly gradual and thus this picture somewhat oversimplifies (see also 
section 5.3.3). All of the prosodic variation in wh-questions concerning scaling here 
does not seem to correlate with any pragmatic differences. This is arguably different 
for those wh-questions that have a final rise. The few instances that can be found are 
all variations of asking cómo? “How/what?” that can be interpreted as a request for 
repetition because the speaker has not heard or understood well what the other was 
saying. Thus, these are not really asking for a variable to be filled in an open proposi- 
tion, but essentially just for repetition of the preceding utterance, and they are under- 
stood as such (as evidenced by the reaction of the interlocutor, cf. (48) and Figure 54). 


(48) AZ23 ZZ24 MT ES 0102-0195 (context for AZ23 MT ES 0153)'^* 
time ZZ24 (the one with the path) AZ23 (the one without the path) 
10.2 y por el centro della quena y el 

cerro pasa el camino 

and through the middle of the flute 

and the hill does the path go 
15.3 cómo va 

how do you mean 

16.6 porelcerro y la quena por la mitad 

pasa 

it goes through the middle of the hill 

and the flute 


5.1.2.5 Summary 

The discussion of interrogative intonation has produced a number of interesting 
results. Despite a demonstrable range of pragmatic differentiation in the discourse 
contexts from which the data is drawn, the intonational form of these utterances seems 
comparatively unaffected by it, if not by prosodic context. The signaling of different 
utterance types and other pragmatic meanings seems to be achieved in large part by 
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avian nil MT M 


ri il lil 


al Nl T Wh jv "n CHEN 
il "MN iu lis Mf Wt Wi i Metri 


FO (Hz) 
— 
E 


Figure 54: AZ23 MT ES 0153 (wh-question with a final rise, a request for repetition). 


other means, or is left to the context. In fact, if we take the analysis called “alternative 
II" for polar interrogatives and some form ofthe deaccentuation analysis for wh-ques- 
tions to be correct, then so far pitch accent choice in the data as a whole has remained 
paradigmatically invariant, with a single pitch accent, LH*, occurring in both prenu- 
clear and nuclear position in declaratives, all types of polar questions, alternative 
questions, and wh-questions. The edge tone inventory is only slightly larger, composed 
of two monotonal (L-/% and H-/%) and one bitonal (LH-/%) boundary tones. Certainly 
the results here cannot be called definitive in the sense that they can exclude the pos- 
sibility of further pragmatically-conditioned variation in paradigmatic tone choice in 
Huari Spanish as a whole. Yet even so, these findings suggest a very regularized pitch 
accent system, especially when compared to the wealthy tonal inventory described 
for Madrid Spanish (cf. section 3.4.3). However, we found evidence that a more syn- 
tagmatic device aids in the cueing of utterance type prosodically in both wh-questions 
and alternative questions, namely deaccentuation or reduced scaling of material fol- 
lowing the highest pitch accent, sometimes extending across entire phrases. 


5.1.3 Patterns of (information structurally-conditioned) variation in declaratives 


In what follows, variations on the main accentuation will be described and possible 
functional explanations (prosodic and information structural) for them explored. 


5.1.3.1 Phrase accentuation 

With the term *phrase accentuation" I want to label an intonational variant occur- 
ring in Huari Spanish declaratives whereby not every accentable word is pitch 
accented, but only the last one in a larger phrase (cf. Figures 55 and 56). This is in 
contrast to the *main" accentuation variant. 
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FD (ite) 


animales 


ma 


Figure 55: MS27 Cuent ES 1520' (y que los animales estaban maltratando “and which the animals 
were mistreating'; declarative with strong phrase accentuation) Cf. context (53). 


FOU) 


Figure 56: ZR29 Cuent ES 15721 (se lanza hasta mientras pasa el anciano ‘it launches itself while the 
old man passes by”; declarative with strong phrase accentuation). Cf. (52) for context. 


The two accentuation variants should not be seen as completely separate: it seems 
that a variation space exists in which the *main" accentuation variant is at one 
end, and a complete “phrase accentuation" at the other, and the distance between 
them is navigated by gradual degrees of pitch scaling. In between we find examples 
of utterances where each word clearly has a pitch accent but the last one has the 
largest excursion (cf. Figure 57), and others where the pitch accents on the non-final 
words have very small excursion locally, while only that on the last one is readily 
identifiable (cf. Figure 58). 


147 https://os£io/u2bzp/ 
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FP ute 


ME 


Figure 57: QF16 MT ES 1011'? (si me dices debajo del corderito tengo que cruzar nuevamente se supone 
llegar arriba “if you tell me below the little lamb I have to cross again supposedly get to above 
declarative with weak phrase accentuation). 


"uui 


Figure 58: TP03 Cuent ES 0561'*? (y el anciano seguía su camino ‘and the old man continued on his 
way"; declarative with weak *phrase accentuation"). Cf. context (51). 


The peak on the final word in these phrase accentuation examples is still aligned 
within the stressed syllable. But it is also followed by a final fall, indicating the pres- 
ence of a low boundary tone. Thus, the domain over which *phrase accentuation" 
occurs in these cases seems to be identifiable with the same for which boundary 
tones mark the right edge. This should therefore be either the phonological phrase, 
the intermediate phrase or the intonational phrase, depending on which other evi- 
dence there is for an independent existence of these domains. In section 5.2, a more 
definite analysis will be provided. For our description now, it suffices to say that 
it is a phrase larger than an individual prosodic word. In the strongest cases of 
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“phrase accentuation”, we thus have only one pitch accent per such a phrase, asso- 
ciated with the final word in it. Thus pitch scaling is here used to signal a position 
of relative strength of the last accentable word in such a phrase compared to all 
the preceding words. This rightmost prominence relation is in general assumed 
to hold for phrases in Spanish at least in so-called broad-focus contexts (Hualde & 
Colina 2014: 266-268; Ladd 2008, cf. also section 3.7.3). It is also assumed to hold 
in a phrase in the *main" accentuation mode (with all pitch accents having more 
or less the same scaling) if nothing else changes. The phrase accentuation is thus 
an optimized expression of this relation via pitch scaling: whereas in the *main" 
accentuation, final prominence has to be either signaled by other means or is 
simply the default expectation (Ladd 2008: 257—258, cf. section 3.7.3), in the “phrase 
accentuation" it is openly signaled. A very schematic comparison between *main" 
and *phrase" accentuation is given in (49), with more gradually differing interme- 
diate steps omitted. 


(49) Schematic comparison between “main” and “phrase” accentuation 


a) NN. IN b) X t) EN 
d N Sár \ Po Pf \ * N 
LH* LH* LH* L% LH* LH* LH* 1% LH* L96 
XXX XXX XXX syllables XXX. XXX XXX syllables XXX XXX XXX syllables 
x x x words x x x words x x x words 
x phrase x phrase x phrase 
„main“ accentuation intermediate step ,phrase" accentuation 


The “main” accentuation (49)a) and full “phrase” accentuation (49)c) versions of 
the same hypothetical utterance here do not differ in their metrical representa- 
tion, in which the final word in the phrase is assigned the highest prominence in 
the phrase. This predicts that both versions should cue ambiguously between a 
reading in which focus is on the final constituent and a “broad-focus” reading. The 
main accentuation is furthermore partially ambiguous also in respect to a reading 
where one of the pre-final constituents is in focus, if no deaccentuation occurs (cf. 
Figures 66 and 67 in section 5.1.3.2). The phrase accentuation would certainly be less 
frequently expected in such contexts because it cues rightmost prominence more 
clearly. Thus, the “phrase accentuation” cues a metrical structure which is ambig- 
uous between two information structural readings (focus on the final constituent 
vs. broad focus), while the “main accentuation” is in principle ambiguous between 
three readings (focus on the final constituent vs. focus on a pre-final constituent 
vs. broad focus) and metrical structures with either final or prefinal prominence. 
The phrase accentuation optimizes the phrase as well as its final accent, because 
it treats the phrase as the domain of pitch accent culminativity, implemented partially 
gradually via scaling. Its pitch movement is also delimitative in that it clearly marks 
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theright edge ofthe phrase in which the strongest word is final. In this it differs from 
the main accentuation, which does not signal the extent of the phrase via the pitch 
contour, but instead optimizes each prosodic word within it. These different opti- 
mization strategies are further exemplified in Table 10. It describes the continuum 
from main accentuation (a), most word-optimizing, to the different modes of phrase 
accentuation (b, c,and d). The rightmost case (d) is one where all pitch accents asso- 
ciated with stressed syllables are gone and only a pitch movement associated with 
the phrase edge occurs. This optimizes the phrase by pitch event culminativity, and 
by delimitation, but the pitch event is no longer located at the strongest position. It 
does not signal individual prosodic words at all. Some examples for this mode were 
already given in section 5.1.1.2 and we will also see some more later.!31 


Table 10: Optimization of different prosodic domains by differences in pitch accentuation and 
excursion. 


more word-optimizing <----------------------------------------------------------------- > more phrase-optimizing 


a) each word in a b) each word in a c) only the last word in d) pitch movement 

phrase with rightmost phrase with rightmost a phrase with rightmost corresponding to a 

prominence has a pitch prominence has a pitch prominence has a pitch boundary tone on 

accent on its stressed accent on its stressed accent on its stressed the phrase but not 

syllable with similar syllable, but excursion on syllable necessarily on the 

excursion the last one is larger than stressed syllable 
on the preceding ones 


ENF AS ES An NEA. 


most word-optimizing both word optimizing optimizing both the most phrase-optimizing 
(pitch accent and prominent word- most prominentword ` (pitch movement 
culminative on prosodic optimizing (pitch accent and the phrase (pitch — culminative and 
word, equal scaling) culminative on prosodic accent culminative delimitative for 

word, unequal scaling) on phonological phonological phrase) 


phrase and located on 
strongest position, pitch 
event delimitative for 
the phrase) 


The phrase accentuation variant is not only different from the main accentuation 
with regards to how it signals the prominence of the final word (via pitch accent 


151 Option (d) is not merely a case of option (c) without an additional low boundary tone, as the 
comparison between speakers ZE55, on the one hand, and XJ45 and NQ01, on the other, in section 
5.2.4 will demonstrate. 
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culminativity and delimitation), but also in how it signals the non-prominence of the 
non-final words (via a reduction in pitch scaling up to what amounts to deaccentu- 
ation). It would be hard to reconcile such a realization with a reading where one of 
the pre-final constituents is at-issue. Taking this prediction as a point of departure, 
is it possible to find functional factors that favour the phrase accentuation? In the 
following, we will explore the possibility of identifying contexts in which it prefera- 
bly occurs. The phrase accentuation does not occur in the speech of all the speakers 
studied here. In the data considered here, only 15 speakers produce something like it, 
and for all of them it is a minority mode of accentuation, while the majority of their 
utterances is produced in the main accentuation (cf. also its low occurrence in the 
Conc data discussed in 5.1.1.2). Itis unknown whether this reflects a true distribution 
of this accentuation phenomenon among the speakers, or whether it is a sampling 
artefact, but overall it speaks to the variability that is at the speakers’ disposal. As far 
as contexts favouring phrase accentuation can be identified, they are only potential 
loci for it, not categorical ones. The following discussion will argue that there are 
identifiable aspects of contexts that increase the overall likelihood of occurrence of 
the phrase accentuation, under the assumption that the phrase accentuation is felic- 
itous when no particular highlighting of pre-final constituents is intended or plausi- 
ble from the context, and when instead either the final constituent or the structure 
of the higher prosodic unit (the phrase) is to be emphasized. The analysis here is 
exploratory and looks at individual examples in context. For the future, a more con- 
trolled quantitative investigation would be desirable to confirm its results. 


5.1.3.1.1 Broad or final narrow focus on an utterance 


Figure 59: HA30 Cuent ES 2285'*? (two declaratives, the first with with main, the second with phrase 
accentuation). 


152 https://osf.io/zudme/ 
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(50) ZR29_HA30_Cuent_ES_2268-2361 (context for HA30, Cuent ES 2285)? 
General context: they are telling a story about an old man (sefior), who wants 
to go visit his granddaughter and meets several obstacles on the way. ZR29 
already told the story, now it is HA30’s turn. 
time ZR29 (the one who tells the HA30 (the one who re-tells the story) 


story first) 
226.8 en eso luego 
so then 
227.7 seencuentra con un gi[gante 
he encounters a giant 
228.5 ah se] encuentra con un gigante y 
231.6 y ese gigante era un cóndor 


ah he encounters a giant and that 
giant was a condor 
233.7 (they both laugh) 


(50) is the context for HA30 Cuent, ES 2285 (Figure 59). In the excerpt given in the 
figure, HA30 first reproduces ZR29's preceding assertion that the old man met a 
giant (se encuentra con un gigante y). Both HA30's and ZR29's utterances here are 
answers to the QUD WHAT HAPPENED THEN TO THE OLD MAN?, overtly expressed 
by HA30 at 226.8. The giant (gigante) is introduced here for the first time in the 
story.5* All accentable words are pitch accented. In the second utterance (y ese 
gigante era un cóndor), only the final word cóndor is pitch accented; notably, no 
relevant pitch movement takes place on gigante. The most plausible QUD here is 
something like WHO OR WHAT WAS THAT GIANT?, with gigante given and not at-issue. 
The giant (gigante) is overtly mentioned because it is the topic switched to here 
(the preceding utterances were predications on the discourse topic el sefior (the 
old man), omitted as subject), but it is given and not intended to continue as dis- 
course topic (the story continues with the old man as protagonist, now meeting the 
condor). Arguably, all of this contributes to the difference in accentuation variants: 
at 228.5, gigante introduces a new discourse referent and is both phrase-final and 
pitch accented, at 231.6 it is given and not intended to continue as topic so that the 
whole utterance is something of an aside. It serves to introduce the next relevant 


153 https://osf.io/st6d8/ 

154 Strictly speaking, it was already introduced as a discourse referent when ZR29 first told the 
story to HA30. But it is now HA30's turn, and her asking for a prompt at 226.8 betrays her ignorance 
of what comes next, so gigante is not salient for her. And even though it is ZR29 who then introduc- 
es the giant, it seems likely that at 227.7, HA30 does not treat gigante as given because it is now her 
role in the game to tell the story to a third person. 
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discourse referent, the condor, which is placed phrase-finally and thus in the right 
position to be accented even in phrase accentuation. 


(51) TP03_KP04_Cuent_ES_0455-0618 (context for Figure 58 (also Figure 21))?? 
General context: they are telling a story about an old man (anciano), who 
wants to go visit his granddaughter and meets several obstacles on the way. 
TPO3 is telling the story for the first time. 
time TP03 (the one who first tells the story) 

45.5 enesoelanciano 
so the old man 
46.9 sacó su lapiz 
took out his pencil 
48.8 empezó a dibujar 
began to draw 
50.4 una sombra 
a shadow 
51.3 tan grande como el gigante 
as large as the giant 
52.5 que le dio miedo al gigante y el gigante huyó 
so that it frightened the giant and the giant fled 
561 y el anciano seguía su camino 
and the old man continued on his way 
58.5 y durante el trayecto en su camino 
and while he was on his way 
60.8 apareció un cóndor 
a condor appeared 


A somewhat similar situation obtains in the case of TP03_Cuent_ES_0561 (Figure 
58), with the context in (51). TP03 tells a version of the same story as HA30 does in 
(50), but as the one telling it for the first time. 45.5-52.5 recount how the old man 
(anciano) draws a shadow on the ground that is so large that it frightens the giant 
into fleeing. 45.5-50.4 answer the QUD WHAT DID THE OLD MAN DO?, with the old 
man as the discourse topic. In 51.3-52.5, a sub-QUD to that is answered, namely 
WHAT KIND OF SHADOW WAS IT? (continued in 52.5 with a further sub-QUD, WHAT 
EFFECT DID THE SHADOW BEING SO LARGE HAVE?), with the shadow (sombra) being the 
discourse topic. With the end of the utterance at 52.5, the sub-QUDs about what the 
old man did about the giant are answered and that part of the story is concluded. 


155 https://osf.io/xp67y/ 
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In 56.1, there is a return to the old man as (given) discourse topic which he remains 
afterwards (as evidenced by his subsequent pronominalization in 58.5), but the 
utterance itself is probably best interpreted as answering the QUD WHAT HAPPENED 
THEN?, which is a sub-QUD only to the overall QUD WHAT HAPPENS IN THE STORY?. An 
alternative, which would have el anciano as contrastive topic, would be to assume 
a QUD WHAT DID THE OLD MAN DO? as directly subordinate to a QUD asking after 
a set, WHAT DID THE SHADOW AND THE OLD MAN DO?, which is then answered by 
answering the two subquestions WHAT DID THE SHADOW DO? and WHAT DID THE OLD 
MAN DO?. Nothing in the context makes it plausible that such a parallelism between 
the shadow and the old man (that are two entirely different entities in terms of 
animacy, prototypical agenthood (cf. Dowty 1991 and relevance to the overall story) 
is intended here. Another alternative, resulting in narrow focus on el anciano, can 
equally be excluded, because that would assume a QUD WHO WENT ON HIS WAY?, but 
it is not presupposed in the context that someone went on their way. Yet another, 
with narrow focus on camino, would imply a QUD like WHAT DID THE OLD MAN CON- 
TINUE ON?, but nothing suggests that it is presupposed that he continued on some- 
thing. In sum, the most parsimonious QUD WHAT HAPPENED THEN??6 makes the use 
ofthe phrase accentuation here plausible, since none of the individual referents or 
events expressed as accentable words (anciano, seguía, camino) are relatively more 
relevant for answering it than the others, but the utterance as a whole answers it. 


(52) ZR29 HA30 Cuent ES 1415-1645 (context for Figure 56)?" 
General context: they are telling a story about an old man (sefior), who wants 
to go visit his granddaughter and meets several obstacles on the way. ZR29 is 
telling the story for the first time. 
time — ZR29 (the one who tells the story first) 
141.5 en eso de nuevo se toca con el 
so then once more he meets the 
145.2 cun- 
145.8 cómo se llama 
how is it called 
146.3 con el cóndor 
with the condor 


156 This QUD is possibly also favoured by the use of the imperfect tense seguía here, while the sur- 
rounding narrative is told using the perfect tense. Compared to the perfect, the imperfect has been 
argued to be used to convey narratively backgrounded information (Hopper 1979; López-Ortega 
2000). That would arguably make an internal information structural partition (with narrow focus 
on one of the elements) less likely. 

157 https://osf.io/nvc23/ 
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147.9 y 
148.4 coge s- el anciano saca su mani se lo lanza 
and the old man grabs- takes out his peanut throws it to it 


151.8 como 
because 
152.5 al 


154.0 alcondor le gusta tanto los- sus- (0.16) los manís 
the condor likes the peanuts so much 
1572 se lanza hasta mientras pasa el anciano 
it launches itself [at the peanuts] while the old man passes by 
160.9 llegaa su casa de su nieta de- de (0.13) saca el na- la naranja 
arrives at the house of his granddaughter of of (0.13) takes out the orange 


Again a similar argument can be made for ZR29 Cuent ES 1572 (Figure 56), with 
context (52). Here, ZR29 tells the story for the first time, and describes the encoun- 
ter ofthe old man with the condor: he takes out the peanuts he carries with him and 
throws them to the condor, who likes peanuts very much. This leaves him distracted 
so that the old man can continue on his way. The utterances from 141.5-145.2 as 
well as 148.4 have the old man as discourse topic, answering a QUD like WHAT DOES 
THE OLD MAN DO NEXT?. 151.8-154.0 as well as the first phrase in 157.2 then have 
the condor as topic, answering first WHAT IS THE CONDOR'S MOTIVATION? and then 
WHAT DOES THE CONDOR DO?. This last QUD is answered only in the first phrase in 
157.2, by se lanza ‘he throws himself [at the peanuts]’. As can be seen in Figure 56, 
this first phrase is separated from the rest by a high boundary tone after lanza; 
the rest of the utterance taking place in a separate phrase, hasta mientras pasa el 
anciano “while the old man moves along’ is then phrase accentuated. This might 
at first sight be taken as a topic switch back to the old man, el anciano, which is 
placed rightmost and thus receives the strongest pitch accent. With this QUD struc- 
ture, el anciano would then be a contrastive topic (cf. Büring 2003; Roberts 2012b). 
However, such sentence-final subject NPs as el anciano here have been proposed to 
signal decreased salience or continued topics in Spanish (Ocampo 2003, 2010), they 
are called antitopics by Lambrecht (1994) and are not contrastive topics. The lack 
of parallelism in the utterances about the condor and the old man!*? also does not 


158 Syntactically, 151.8-157.2 form a complex sentence together: 
(i) [[como al condor; le gusta tanto los manis] ise lanza [mientras pasa el anciano]] 


This sentence has an empty subject referring to the condor, realized only in the causal subordinate 
clause como al condor le gusta tanto los manís, while el anciano is the realized subject in the other 
subordinate clause headed by hasta mientras. A parallel QUD structure like WHAT DID THE CONDOR 
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support an interpretation as contrastive topics. It seems much more likely that el 
anciano here simply realizes a given referent that only subsequently will be con- 
tinued as a topic in the next utterance (160.9), meaning that the phrase answers a 
QUD like “what happened meanwhile?", which is not a sister to, but subordinate 
to the current QUD, *what does the condor do?", and thus is not at-issue in this 
utterance.*? An alternative QUD asking only after the old man (“who passed along 
meanwhile?") is out because the action of passing along is not presupposed. 

As an intermediate result, the three context examples have shown the use of 
phrase accentuation in contexts where only the final element in the phrase is fore- 
grounded (50), where the entire phrase as a whole is the answer to the current QUD 
(51), and where the entire phrase as a whole is arguably the answer to a QUD that 
is not current, i.e. backgrounded (52). 


AND THE OLD MAN DO? would thus be answered once by this empty subject in the main clause ([el 
cóndor] se lanza) and once by the finally placed but realized subject in the subordinate clause 
(mientras pasa el anciano). If this were the intended QUD structure, the syntax would therefore be 
maximally unhelpful in exposing its parallelism. 
159 Riester (2019: 180—181) points out that many types of non-at-issue material such as adjuncts, 
evidentials and evaluatives only aren't at-issue with respect to the current QUD, but pose an an- 
swer to a different (subordinate) QUD. In many cases where the non-at-issue material is utter- 
ance-final, it can be difficult to establish whether it is really not at-issue or simply asserting the 
answer to the next QUD. In our case, it seems decidable: 
(i) Qi: What does the condor do? 
A1: se lanza 
Q2: What happens meanwhile? 
A2: hasta mientras pasa el anciano 
(i) Q1: What happens meanwhile? 
A1: hasta mientras pasa el anciano 
Q2: What does the condor do? 
A2: se lanza 


While (i) seems a plausible question-answer structure for ZR29 Cuent ES 1572, (ii) is incoherent. 
This would also be the case if the utterance was ordered as hasta mientras pasa el anciano (el con- 
dor) se lanza. That's because the clause headed by hasta mientras is not just syntactically subordi- 
nate to another, but also denotes an action or event that must be interpreted in relation to another 
action or event (with the nature of the relation here being their temporal coincidence), but not the 
other way around. It would also be possible to utter se lanza hasta mientras pasa el anciano with 
the hasta mientras part at-issue, but only as answer to a current QUD such as WHEN DOES HE THROW 
HIMSELF FORWARD?, which implies a context set that has already been sufficiently restricted such 
that in all possible worlds contained in it, the proposition “the condor throws himself forward" is 
true; in other words, when the QUD DOES HE THROW HIMSELF FORWARD? has previously been an- 
swered affirmatively and it is thus still superordinate to the current QUD. 
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5.1.3.1.2 Coherent phrasing: Complements to relative clauses and others 

The phrase accentuation also occurs on phrases that realize complements to rela- 
tive clauses, such as in MS27_Cuent_ES_1520 (Figure 55), with context (53). MS27 is 
re-telling the story after being told it by CF28. 


(53) MS27 CF28 Cuent ES 1409-1582 (context for Figure 55)'®° 
General context: they are telling a story about a giant (gigante), who meets 
several animals. MS27 is now re-telling the story, after she was told it for the 
first time by CF28. 
time MS27 (the one who re-tells the story) 
140.9 yelgigante 
142.3 cogió su sombrero y siguió su camino 
and the giant grabbed his hat and continued on his way 
145.5 eneso 
1474 ve 
148.4 unfuneral 
so then he sees a funeral 
149.8 donde estaban las flores 
where the flowers were 
1520 y quelos animales estaban maltratando 
and which the animals were mistreating 
155.3 ehelgigante decide matar a los animales 
uh the giant decides to kill the animals 


In 145.5-148.4 she recounts how the giant gets to a funeral (un funeral). In 149.8 
and 152.0 then follow two utterances that realize relative clause complements, first 
(149.8) to un funeral itself (un funeral [donde estaban las flores]) and then (152.0) 
to the NP las flores that is part of the first relative clause (las flores [(y) que los 
animales estaban maltratando]). Both of these relative clause phrases are realized 
in phrase accentuation. It seems plausible that this is due to an effort to prosodi- 
cally optimize such complement clauses as coherent (single) prosodic phrases in 
order to signal their relation as a whole to the preceding relative head; this is in 
principle better achieved with a phrase that has only a single pitch accent that also 
serves to delimit the phrase than by one with several pitch accents. It seems possi- 
ble to prefer achieving such coherence over accenting each accentable word, even 
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if those words introduce new referents.!5! This seems at least a contributing factor 
here. However, looking at more examples from this corpus and other corpora, we 
see that phrase accentuation is used (in particular, but not only) by MS27 and CF28 
in a wide variety of utterances in their Cuento corpus that are certainly not all 
relative clause complements. For instance, it also occurs in the utterance following 
the two relative clause complements just discussed, in 155.3. It also appears in the 
initial utterances by both of them, i.e. when CF28 begins telling the story for the first 
time (cf. Figure 60) and at the beginning of MS27's retelling of it (cf. Figure 61). Both 
ofthese display a somewhat remarkable initial intonation: a pitch peak with strong 
excursion is realized on había un or había, then a steep fall occurs which ends on 
the first syllable of gigante and pitch then continues at a lower overall level. This 
pronounced pitch movement could result from an LH* pitch accent followed by a 
high boundary tone, and the following steep fall is then due to the leftward align- 
ment of the next upcoming L tone that is already familiar from section 5.1.1. 


Mibi, coi 


Figure 60: CF28 Cuent ES 0012'*? (había un gigante que estaba durmiendo debajo de un árbol there 
was a giant who was sleeping under a tree’; declarative with phrase accentuation). 


161 Although all the discourse referents mentioned in these utterances are introduced for the first 
time in this re-telling of the story, they are realized as definite NPs, as if given. It is possible that the 
speaker treats the re-telling of the story less like her own telling, pretending that the listeners don't 
know it, and instead more like a memory task she has to perform: how good she is at remembering 
the story she has been told, an ability which the listeners (the experimenters) are assessing by com- 
paring it to the original version known to them. Great care was always taken by the experimenters 
to avoid the impression that the productions by the speakers would be assessed in such a way; we 
usually said at the beginning of the experiment session that this was explicitly not the case and also 
on occasion during the course of the session. However, since some speakers (like MS27 and CF28 
here) were students at the time of recording, it is not impossible they associated the experimental 
tasks with similar tasks known from education contexts, at which they would be assessed in their 
performance. 

162 https://osf.io/mx2k7/ 
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Figure 61: MS27_Cuent_1207'® (había un gigante que estaba descansando debajo de un árbol ‘there was 
a giant who was resting under a tree”; declarative with phrase accentuation). 


Such a phrasing is odd in terms of an expected correspondence to syntactic or infor- 
mation structural categories. It might more readily be expected to separate había 
un gigante from the rest, mapping prosody to a topic-comment structure. As it is, 
the proposed phrasing goes against a mapping of prosody with information struc- 
ture and syntax, if the boundary really occurs between the determiner un and the 
noun gigante. Yet in both utterances under discussion here, the picture seems to 
unambiguously point to this conclusion. I will call such increased scaling on the 
initial word in a large phrase an “initial boost". It will reoccur in the double top- 
ic-utterances treated in section 5.2. 

The remainder of these utterances exhibit phrase accentuation. It is unambig- 
uous that both these utterances end with a (LH*) pitch accent on the final word 
árbol, followed by a low final boundary tone, L%. In CF28’s version, the excursion of 
none of the pitch movements on any of the stressed syllables between gigante and 
árbol really exceeds the fluctuation due to microprosody happening also on all the 
other syllables, and globally, pitch follows a decaying trajectory until a lowest point 
shortly before the final stressed syllable on árbol. On gigante, no pitch accent on 
the stressed syllable can be identified, but pitch is suspended perhaps due to a high 
boundary tone at its end, which auditory inspection supports. Thus, here at least 
the stretch que estaba durmiendo debajo de un árbol seems to form a single phrase 
with only one final pitch accent. MS27's utterance is an intermediate version where 
compressed intermittent pitch movements likely due to intonational tones can be 
more readily identified. At the end of both gigante and of descansando, final rises 
occur that are not due to microprosody, but are also not aligned with the stressed 
syllables of these words. These are cases of the rightmost type (d) of phrase accen- 


163 https:;//osf.io/9c6nt/ 
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tuation in Table 10, without pitch accents associated with prominent positions and 
only phrase-delimiting tonal movement. They can be seen as a type of continuation 
rise; in final position (at the end of the utterance), they do not occur here because 
of the final L%. Cf. the analysis of rising and falling contours in the Quechua data in 
section 6.1.1. A further example is Figure 62. 


ro au) 


Figure 62: 5039 MT ES 3351'*' (ya del corderito has pasado a unas personas que están esperando 
el tren, no 'ok from the little lamb you passed some people that are waiting for the train, right"; 
confirmation-seeking question with *phrase accentuation"). 


Here there is a pitch accent with strong excursion followed by a final high boundary 
tone on corderito, then the stretch has pasado a una personas is phrase accented 
only with a final high boundary tone, all other movement being due to micropro- 
sody, and the final stretch que están esperando el tren no is phrase accented with 
a pitch accent only on tren, followed by the LH96 final boundary tone for polar 
tag questions (see section 5.1.2.1). The phrasing produced in these two examples 
(MS27 Cuent ES 1207, Figure 61 and SO39 MT ES 3351, Figure 62) and others 
like them further corroborates the suggestion that phrase accentuation can occur 
when it is worthwhile to signal a prosodic structure that is above the level of the 
individual prosodic word as a whole instead of signaling its internal structure. In 
S039 MT ES 3351, it helps signaling the separation between an NP refering to an 
anchoring landmark (del corderito), and a proposition that serves as an instruc- 
tion for how to proceed from that landmark onwards (similar to topic-comment 
structures), and within this instruction it further helps signalling the relative com- 
plement clause (que están esperando el tren) as belonging as a whole to its preced- 
ing head noun (unas personas). In MS27 Cuent ES 1207 it also separates a head 
noun (un gigante) from the complement clause (que estaba descansando debajo de 
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un árbol), and within this complement clause it further separates the verb phrase 
from the locative adverbial (see (54) and (55), where square brackets indicate 
right phrase boundaries, syllables in italic are stressed and syllables in bold also 
accented!$5), 


(54) yadelcorderito] has pasado a unas personas] que están esperando el tren no] 
LH*H- H- LH* LH96 


(55) había] un gigante] que estaba descansando] debajo de un árbol] 
LH*H- H- H- LH* L% 


I argue that this is then the generalization we can make: the phrase accentuation 
is preferentially used whenever a structure with rightmost prominence needs to 
be signaled that is intermediate between that of the level of the individual pro- 
sodic words and the whole utterance, and if this can be done at the expense of 
deemphasizing individual prosodic words and the discourse referents they encode 
(up to producing only phrase-edge tones and no pitch accents at all) without losing 
contextually relevant information. This entails that the phrase accentuation will 
preferentially be used only in phrases where either final focus or broad focus inter- 
pretations are allowed from the context, as observed. It also strikes a parallel to 
the phenomenon of ‘dephrasing’ described for ‘edge-prominent’ languages like 
Japanese and Korean (Pierrehumbert & Beckman 1988; Jun 1993, 2005b; Venditti 
et al. 1996; Ladd 2008; Igarashi 2015; cf. section 3.4.6): words that are less promi- 
nent are not phrased separately but in a larger phrase together with a more prom- 
inent word. In comparison with the Japanese case, it is mainly the position of the 
prominent word within the larger phrase that differs: there, the most prominent 
word is leftmost in the phrase, whereas here it is rightmost. It also ties in with 
another relevant observation: the phrase accentuation occurs more frequently in 
the Cuento than in the Maptask corpora, and this is partly because utterances are 
generally longer in Cuento than in Maptask.'® In those longer utterances the phrase 


165 In sections 5.2 and 5.3 it will be shown how the alignment of the H as a phrasal boundary tone 
nonfinally and with the stressed syllable finally results naturally from the additional presence of 
an IP-level boundary tone. There, the categories of the prosodic hierarchy involved will also be 
discussed in more detail. 

166 Most likely, this is in itself due to better possibilities for planning ahead. In Maptask, speakers 
interact with each other constantly, must adapt to changing epistemic conditions, and can update 
the common ground only as a result of constant negotiation with each other. In Cuento, on the 
other hand, the conversational mode is much more monological. The first speaker in particular, 
but also the second, have all the time they need to tell the story. Unless they make mistakes, their 
right to speak and to update CG is in no danger of being contested, the stage is theirs. This opens up 
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accentuation is more widespread, and this makes sense if it helps to signal inter- 
mediate phrasing structures that are mapped to information structure and syntax: 
they are simply absent in shorter and less structurally complex utterances. Note 
that in Quechua Cuento (cf. section 6.4.1), very similar conditions obtain, and they 
are also argued to be mainly responsible for the observed phrasing of larger speech 
sequences, with a rise-falling pitch contour that is in fact very similar to that of 
phrase-final phrase accentuation seen here. 

Optionality obviously plays a large role in the use of the phrase accentuation 
in these spontaneous data. Not only regarding its occurrence at all, but also regard- 
ing the phrasal separations it creates: on virtually the same sentence under very 
similar context conditions, CF28 Cuent ES 0012 (Figure 60) produces the entire 
relative clause complement in one phrase, while MS27 Cuent ES 1207 (Figure 61) 
divides it further in two. In section 5.2, more complex and somewhat more con- 
trolled data will be analyzed. The results will suggest that as here, while certain 
phrasing divisions due to information structure are quite general, there is still a 
considerable space for individual variation. 


5.1.3.1.3 Phrase accentuation as (individual) default? 

So far, we discussed phrase accentuation (with either a single pitch accent or only 
a boundary tone) as a phenomenon whose occurrence is influenced by discourse 
contextual and information structural factors, and that is preferentially used when 
units of discourse that are larger than single referents and the expressions they 
encode (at the size of prosodic words) are to be emphasized at the expense of these 
smaller units. Such a view does not predict phrase accentuation to occur on single 
words. This is true for the Maptask and Cuento corpora studied here. However, there 
is some evidence in the Elqud corpus that for some speakers, in particular NQO1, 
XJ45 and ZE55, phrase accentuation has become almost generalized. Especially XJ45 
produces phrases with only boundary tones almost as a default in elqud, and this 
not only on groups of several words, but also occasionally on single prosodic words. 
He realizes several words together in one such phrase even when from the dis- 
course context it is clear that not the final but a prefinal word in this phrase bears 
the highest information load (i.e., with pre-final narrow instead of final narrow or 


the possibility for planning and producing utterances containing larger coherent chunks of infor- 
mation than in Maptask. Cf. section 6.4 where a similar argument is made to account for prosodic 
differences between utterances from Quechua Cuento and Maptask, with absence of utterance-in- 
ternal IS-partition in Cuento there resulting in contours that bear many similarities with the phrase 
accentuation utterances here. 
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broad focus). This can be seen exemplarily in his utterances XJ45 ELQUD ES 16 
(Figure 63), XJ45 ELQUD ES 19B (Figure 64), and XJ45_ELQUD_ES_17 (Figure 65). 


Figure 63: XjJ45 ELQUD ES 16'*" (falso el hombre está pasando vestido entre las casas ‘wrong the man is 
passing between the houses with clothes on’; declarative with phrase accentuation). 


Figure 64: XjA5 ELQUD ES 19B'€* (falso acá hay tres carros verdes y cinco cuyes negros ‘wrong here 
there are three green cars and five black guinea pigs’; declarative with “phrase accentuation"). 


In Xj45 ELQUD ES 16 (Figure 63), the only word that has a pitch peak aligned with 
its stressed syllable is the final one, casas.!8? All other words are either produced 
entirely as part of a low pitch stretch (falso, está, entre), or a final boundary rise 


167 https://os£io/h6wjb/ 

168 https://osf.io/ayp93/ 

169 As stated above, this will be analyzed as a result of the additional presence of the L% in sec- 
tions 5.2 and 5.3. 
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is realized on their final posttonic syllable, with the preceding stressed syllable 
clearly not the target of the peak (hombre, pasando, vestido). Instead, the elbow 
starting the rise is placed somewhere on the stressed syllable. Even on the single 
word vestido such a phrasal contour is realized. Vestido here encodes the predicate 
that constitutes the contrast to the experimental provocation,’” and it is likely that 
producing it in its own separate phrase is a means for signalling this. 


Figure 65: XjA5 ELQUD ES 17" (falso los ratones se lo comieron todo el queso ‘wrong the mice ate all 
the cheese’; declarative with “phrase accentuation"). 


In XJA5 ELQUD ES 19B (Figure 64), falso acá hay, tres carros, verdes and (y) cinco 
each are realized as a separate phrase with a finally rising phrase accentuation 
contour, whereas in the last phrase, cuyes negros, the peak is again located on the 
final stressed syllable. Note that the final peak on verdes is scaled much higher than 
that at the end ofthe other phrases, coinciding with the separation between the two 
conjoined sentences that make up the utterance together. Section 5.2 will discuss 
evidence that the scaling of pitch peaks, both of accents and boundary tones, is 
used to signal boundaries at different levels of the prosodic hierarchy. Although 
they should both be the locus of the correction, the numerals tres and cinco are 
treated differently by the prosody here.’” While tres is clearly realized as part 
of the low stretch in the phrase tres carros (with final peak on the posttonic syl- 


170 The provocation is El hombre está pasando calato entre las casas *the man walks naked be- 
tween the houses", while the animated image shows a man in formal attire and with a bunch of 
flowers moving between houses. The expectation was that a correction would concentrate on the 
state of dress of the man. 

171 https://osf.io/7znmr/ 

172 The provocation is aquí hay un carro verde y tres cuyes negros “here there is one green car 
and three black guinea pigs", while the image shows three green cars and five black guinea pigs. 
The expectation was that the correction would concentrate on the number of cars and guinea pigs. 
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lable of carros), cinco is realized as the rightmost word in such a phrase, with a 
final peak (and a new phrase clearly beginning with cuyes). That is to say, cinco 
is aligned prominently at the right edge of the phrase, while tres is realized at its 
left edge. It seems that in Elqud in general, the relation between what logically is 
the locus of the correction and its prosodic realization is not straightforward, but 
instead mediated by both prosodic and syntactic structure as well as dependent 
upon interpretation of the context by the speaker, and even then the result is still 
subject to further variation (see section 5.2 for a detailed analysis of a subset of the 
Elqud utterances). 

In XJA5 ELQUD ES 17 (Figure 65), the relevant section concerns the last noun 
phrase, todo el queso. It is realized with phrase accentuation, with the quantifier 
todo realized completely on the low stretch before the pitch peak. Yet it is at the 
same time the unique location for the correction relative to the provocation, which 
is los ratones se han comido un poco del queso “the mice have eaten a bit of the 
cheese", while the animated visual stimulus shows them to have the whole cheese. 
This is a further point in case that the mapping between information structure, 
metrical structure, and its cueing via pitch should best be described in preferential, 
but not categorical, terms. All of the examples here demonstrate how for XJ45, the 
phrase accentuation variant is a virtual default, which seems to also mean that 
it lacks some of the functionality it has for other speakers. XJA5's Elqud examples 
represent a shift to a edge-prominent prosody instead of the head-prominent one 
exemplified by the main accentuation (in Jun 2005d, 2014b's terminology, see also 
sections 3.4 and 7.4.1), but using the same underlying tone sequence. In sections 5.2 
and 5.3, this shift will be analyzed also in the context of what it implies for the pro- 
sodic hierarchy, and section 7.4 will also establish further connections to Quechua. 
In the following we will discuss utterances in which context conditions that impose 
an internal IS-partition on utterances in which focus is not final (like in XJA5's utter- 
ances here) do have an effect on prosody. 


5.1.3.2 Deaccentuation in declaratives 

In the Huari Spanish data, deaccentuation does not only occur in alternative ques- 
tions or wh-questions (cf. 5.1.2.3 and 5.1.2.4), but also in declaratives. It may occur 
on postnuclear material, i.e. when the highest prominence is realized nonfinally 
within a larger phrase or utterance. Typically, as in other Spanish varieties, this 
can happen when context allows for an interpretation whereby an utterance is 
partitioned internally so that at-issue material precedes non-at-issue material. It 
is clearly an optional process. As just seen, in XJ45's phrase accentuation exam- 
ples, these context conditions do not effect deaccentuation. To demonstrate that 
this optionality also exists in utterances with main accentuation, I first provide two 
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examples within their contexts in which a given or backgrounded constitutent is 
preceded by one which is narrowly focussed and where no deaccentuation occurs. 


Figure 66: 0A32_MT_ES_1443"7 (declarative with main accentuation). Cf. context (56). 


(56) XU31 OA32 MT ES 1344-1476 (context for Figure 66)! 
time OA32 (the one with the path) 
1344 yun zorro vas a encontrar 
and a fox you'll find 
136.9 y porel 
and by the 
138.2 ahívasa encontrar zorro y 
there you'll find fox and 
139.7 debajo del zorro nomás ahí hay y ahí encontrar va 
just below the fox there it is and there you will find 
142.0 unnube 
a cloud 
143.2 denube 
144.8 más arribita nomás váyalo y un 
from the cloud just a little further above go and a 


1464 LH*H- LH* L% 
olla vasa encontrar 
pot you'll find 


The first is OA32 MT ES 1443 (Figure 66), with context (56). The relevant part is y 
un olla vas a encontrar. The context shows that vas a encontrar has been uttered 


173 https://osf.io/m7vuf/ 
174 https://osf.io/udz93/ 
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before and that 0A32 formulates the introduction of each new successive landmark 
referent in the maptask with a variation upon that phrase. It is therefore plausible 
to take vas a encontrar as backgrounded here because it is already part of the cor- 
responding QUD (i.e. not at-issue) which asks for each landmark, something like 
“what are you going to find?". The answer to this is olla,!? which correspondingly 
would be in narrow focus. Despite this, there is a very clearly identifiable LH* pitch 
accent on encontrar. 


Figure 67: S039 MT ES 0743" (declarative with main accentuation). Cf. (57). 


(57) SO39 MD40 MT ES 0562-0775 (context for Figure 67)" 
time $039 (the one with the path) MD40O (the one without the path) 
56.2 con por la persona que está 
agarrando su bolsa 
with by the person holding 
their bag 


175 Note the rise on the preceding article un as well as the short break before olla. This seems quite 
different to hesitations in these corpora, where pitch characteristically drops to low on the element 
before the *moment of interruption" (cf. Ginzburg et al. 2014, i.e., here on un). That element is 
usually severely lengthened and then optionally followed by a silent break or filled pause before 
the *continuation", i.e. the element that continues the “normal” flow of speech. See del murciélago 
in Figure 23 for an example. In contrast, here un is produced with a strong pitch rise, indicating a 
high boundary tone, and it is not lengthened to the degree expectable in hesitations. An interesting 
interpretation might be that here a constituent is cued as aligned with the left edge of a phrase to 
signal that it is in focus. 

176 https://osf.io/uqa68/ 

177 https://osf.io/qnjp6/ 
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60.4 por su [encima 


above them 
60.6 por de]bajo ta!” 
it's below 
61.7 no por [en]cima 
no above 
62.0 [por en-] 
abo- 
62.7 (laughter, whispering (9.4 s)) 
72.1 por encima del perro 
above the dog 
74.3 LH* LH* L% 
por debajo del perro 
below the dog 
75.9 ya por debajo del perro 


right below the dog 


The second example is SO39 MT ES 0743 (Figure 67), with context (57). It shows 
that it is possible to realize pitch accents not only on words following prefinal nar- 
rowly focused constituents, but also following narrow corrective focus (that is, ifthe 
answer to the current QUD asks narrowly for a constituent and the assertion made 
by the speaker bears the [REVERSE] operation with regard to the proposition on the 
table, cf. Farkas & Bruce 2010). At 72.1, MD40 asks whether the path leads above the 
dog. This puts the proposition *above the dog" on the table, with a bias for confir- 
mation, according to Farkas & Bruce (2010). By accepting this proposition as on the 
table, S039 also accomodates to the presuppositions MD40’s utterance makes: the 
existence proposition that the dog exists (as a landmark) and a proposition that the 
dog is the relevant upcoming landmark. When $039 responds at 74.3 without chal- 
lenging the presupposition, el perro is thus backgrounded, because the discourse 
referent it refers to is already given and because the current QUD must already 
include it: por debajo del perro is a correction to the proposition on the table, *above 
the dog?", but at-issue as the locus of the correction (or the [REVERSE] relation) is 
only the locative relation, not the landmark referent “the dog" itself, since it stays 
the same in the question and the correcting response and is presupposed. 


178 Ashortened form of “está”. There is no acoustic trace of the first syllable. As can be seen from 
several other examples, this is not a one-time occurrence but seems to be the usual realization of 
this verb form for several of the younger speakers (e.g. OA32, OV37, MD40). 
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Thus from the context por debajo here has corrective focus, while del perro is 
backgrounded. Nevertheless, perro is pitch accented here with an LH* (with the 
L truncated, cf. section 5.1.1.2). An utterance with pitch accents on each prosodic 
word, i.e. in the main accentuation, is therefore ambiguous between different 
information structural configurations. This suggests that broadly, a similar relation 
between prosodic form and information structure holds for Huari Spanish as has 
been discussed for other varieties of Spanish in section 3.7.3.1. This is a relation 
mediated via metrical structure and probability or preference. In what follows we 
will consider cases of similar contexts where deaccentuation does take place. 


5.1.3.2.1 Deaccentuation in reversals vs. on given material 

The contexts for the two utterances in which we just observed that deaccentua- 
tion did not take place have been treated differently in the literature. Deaccentu- 
ation after corrective focus (as in S039 MT ES 0743, cf. below for what corrective 
focus is in terms of the model by Farkas & Bruce 2010) has been attested previously 
for Spanish in Hualde (2002); Gabriel (2007); Vanrell & Fernández Soriano (2018), 
amongst others. It has to be kept apart from deaccentuation of repeated (given) 
material in the absence of correction (as in OA32 MT ES 1443), which Cruttenden 
(2006); Ladd (2008); Hualde & Colina (2014) all agree is at best marginal in Spanish, 
in contrast to e.g. English or German. In both Hualde (2002) and Vanrell & Fernán- 
dez Soriano (2018), the phonetic variability between strongly reduced and fully 
absent pitch accents postfocally has also been noted. It is there treated as a single 
phenomenon with an explicitly agnostic stance as to whether it really constitutes 
deletion of accents phonologically. Here this holds as well: as with the phrase accen- 
tuation, some cases exist that are intermediate in their phonetic realization, where 
the nuclear pitch accent is still followed by further ones, but those are severely 
reduced in scaling. I will refer to the whole phenomenon here by deaccentuation 
for easier reference, but note that it seems in principle easier to derive a totally 
flat pitch contour as a variant realization for accents somehow marked as reduced 
in scaling phonologically than to explain why phonologically fully deleted accents 
should manifest a variant realization as severely compressed ones. Thus postnu- 
clear reduction might be the better umbrella term. In the context of intonation 
in Germanic languages, where deaccentuation has longer been accepted than for 
Spanish, it has also been shown to variably manifest as more or less strong pitch 
compression/reduction (cf. Kügler & Féry 2017 for German), and that accented?” 
positions are preserved by other prosodic means (cf. Beaver et al. 2007 for English). 


179 Accented, not just stressed, positions. Beaver et al. (2007) compared the acoustic correlates of 
stressed syllables of words bearing second-occurrence focus (usually said to be deaccented when 
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Figure 68: ZE55 ELQUD ES 559? (no se Jo han comido todo hm un poco solo han comido ‘they haven't 
eaten it all hm only a bit they've eaten'; declarative with deaccentuation after poco). 


It is thus quite possible that the absence of deaccentuation in contexts like that of 
OA32 MT ES 1443 (56) is much less variable than in those of SO39 MT ES 0743 
(57), and that a difference in relative polarity between provocation and response is 
an important factor making deaccentuation more probable also in Huari Spanish. 
More specifically, a response might have to be a partial reversal (responding to a 
polar question) or partial denial (responding to an assertion; Farkas & Bruce 2010: 
100—102, 105) of the provocation, with the contested material realized prefinally, 
and other material following, for this following material then possibly to be deac- 
centuated. Just as with the phrase accentuation, only some speakers deaccent in our 
data. In the Spanish Maptask and Cuento corpora, it's only found with TP03, KP04, 
SG15, QF16 and ZR29. As with the phrase accentuation, it is hard to say whether this 
means that this is really interspeaker variation, but as we just saw, at least the two 
speakers OA32 and S039 have been found not to deaccentuate in contexts where it 
could be expected. 

In comparison, consider ZE55 ELQUD. 55 and SG15 MT ES 1815 (Figures 68 
and 69), the latter with context (58). ZE55 ELQUD 55 (Figure 68) is an example of 
a partial denial with deaccentuation from Elqud. The provocation stated that the 
mice had eaten all of the cheese while the visual stimulus showed them to have 
eaten only some of it. In the first part of the response, the provocation is denied (no 
se lo han comido todo). The second then specifies the alteration necessary for the 


following the first focus) with those of words bearing no focus at all, and found significant dif- 
ference not in pitch, but in duration and intensity between them. Stressed but un- or deaccented 
syllables in Spanish have also been found to preserve durational and intensity correlates of stress 
(Ortega-Llebaria & Prieto 2007, 2011; Torreira et al. 2014). 

180 https://osf.io/vxceg/ 
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Figure 69: SG15 MT ES 1815!*' (por medio tienes que pasar ‘through the middle you have to go’; 
declarative with deaccentuation after medio). Cf. (58) for context. 


speaker to accept the proposition. The material denoting the contested part of the 
proposition on the table is realized first and with an exhaustivity marker (un poco 
solo), and the following material denoting the uncontested part is deaccented. 


(58) SG15 QF16 MT ES 1723-1830 (context for Figure 69)? 
General context (cf. Figures 190 and 191 in Appendix B): the lamb, the mil- 
lionaire, the rock, the bat (murciélago) and the dungheap are in different 
locations in the two maps. That is why QF16 cannot understand SG15's 
instructions well when told to go between the bat and the pot (por el medio'** 
del murciélago por lado de olla) after passing underneath the skunk; that is 


181 https://osf.io/skfvr/ 

182 https://osf.io/ntxpa/ 

183 SG15 here consistently uses por el medio del murciélago on its own to mean “between the bat 
[and something else, in this case the pot]". Possibly, this is a calque from Quechua, where in the 
maptasks, X-pa (Y-wan) chawpi-n-pa (X-GEN (Y-INST) middle-3-GEN) is sometimes used to mean 
“between X and Y" (lit. “in the middle of X with Y") without uttering the Y-part in brackets if it 
was mentioned previously. Chawpi by itself means “center”, “middle”, “intermediary”, “point of 
separation" (Parker & Chavez Reyes 1976: 52; Carranza Romero 2003: 50). It is not clear whether 
QF16 completely understands this usage, because at 203.3 he asks, entonces cruzo por la mitad del 
animal, del murciélago? *so then I cross through the middle of the animal, the bat?". SG15 gives a 
confirmation token then, but on her map, the path goes between the bat and the pot (not across 
the bat), and this is what she also states at other times. Thus, whether the relation expressed as 
por medio here by both is really also understood by both to mean the same thing or two different 
things (“between the bat [and something else]" vs. “through (the middle of) the bat") is not entirely 
clear. Fortunately, this is not important for the discussion here, since they treat it as if they were 
meaning the same thing. 
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why he is astonished that he has to do a full turnaround (una vueltaza) at 172.3. 
They previously already tried to get past this part in the instructions when SG15 
said to go between the bat, then they backtracked and have now arrived at it 


again. 

time SG15 (with the path) QF16 (without the path) 

172.3 asu una vueltaza me tengo que dar 
jeez a complete u-turn I have to do 
here 

173.9 mhm 

1751 deahí 

from there 

175.6 para eso tengo que pasar por encima 
del murciélago 
for that I have to go above the bat 

178.5 no 


179.0 porsu medio 

181.5 por medio tienes que pasar 
no through it through the 
middle you have to go 


In (58), QF16 states that he has to go above the bat at 175.6. This puts I HAVE TO GO 
ABOVE THE BAT on the table. SG15 objects to this proposition on the table in 178.5 
and 179.0, partially denying it. When she utters por medio tienes que pasar, we can 
assume a QUD like WHERE IN RELATION TO THE BAT MUST YOU PASS?, where only por 
medio is at-issue as the contested part of the proposition and tienes que pasar is 
backgrounded because it is repeated and uncontested. The at-issue / non-at-issue 
separation here mirrors the separation between accented and deaccented material 
in the utterance. 

Compare this to SG15 MT ES 2272 (Figure 70), with context (28), which is 
segmentally nearly identical (58). A little later in the game at 225.1, QF16 is now 
putting the proposition denoted by por medio del murciélago on the table to be 
confirmed. SG15 confirms at 226.7 and 227.2, repeating por medio tienes que pasar. 
Here, there is no partial reversal interacting with a division between at-issue and 
non-at-issue material between medio and tienes que pasar. Tienes que pasar can be 
seen as given because the action of having to pass a landmark object is accessible 
from the context and almost the same phrase has been uttered earlier already, 
it being basically the default action in this game. The context is therefore quite 
similar to that seen for OA32 MT ES 1443. That SG15 does not deaccent tienes que 
pasar in Figure 70 even though she was seen to deaccent in Figure 69 supports 
the hypothesis that even though there is also an information structural division 
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between por el medio and tienes que pasar here, such a division has to interact with 
the difference in relative polarity between provocation and response that is part 
of a partial denial or partial reversal, but not of a confirmation. Deaccentuation 
seems more likely to take place on the uncontested material in such a reversal or 
denial following earlier contested material, than just on given material following 
new material. 


(59) SG15 QF16 MT ES 2222-2290 (context for Figures 70 and 71) 
General context: same as for (58), but somewhat later. They are once again 
going through the instructions on how to pass the bat. 
time | SG15 (the one with the path)  QF16 (the one without the path) 


222.2 ya 
222.7 estamos en- 
223.6 en la mitad de 
right so we are in- in the middle of 
225.1 por el medio del murciélago me 
has dicho 


through the middle of the bat you said 


226.7 mhm 

227.2 por medio tienes que pasar 
through the middle you have 
to go 

228.6 ya 


Figure 70: SG15 MT ES 2272'* (declarative with no deaccentuation). Cf. context (59). 


184 https://osf.io/4xtd2/ 
185 https://osf.io/vun95/ 
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5.1.3.2.2 Deaccentuation on “parentheticals” and evaluative additions 

Neither the presence of a denial/reversal (generalized to a [REVERSE] feature in the 
response in Roelofsen & Farkas 2015) nor givenness, however, actually are a neces- 
sary precondition for deaccentuation. There are cases of what can be described as 
*parentheticals" that fulfill neither of these conditions, as the following examples 
will show. 


LUTTE 


per e matm de] wurciélago " has dicho 


las 


Figure 71: QF16 MT ES 2251199 (declarative with deaccentuation after murciélago). Cf. context (59). 


In QF16 MT ES 2251 (Figure 71, in the same context (59) as before), me has dicho 
is realized with flat and low pitch following a steep fall after the normally accented 
por el medio del murciélago. This example represents a number of cases where a 
verb phrase with an evaluative, epistemic or reportative verb that takes a senten- 
tial complement and is semantically superordinate to the proposition encoded by 
the material preceding it, but is expressed utterance-finally like an adverbial, is 
deaccented. The utterance here seeks confirmation for the instruction por el medio 
del murciélago (this part is what SG15 responds to), while the proposition that SG15 
told QF16 about this, expressed by me has dicho, is not at-issue. As Roberts (2015, 
2017) argues in the context of how the meaning of expressions of belief relates to 
the meaning of the propositions that are the target of these beliefs, such evaluative 
predicates can be at-issue if they are encoded with a full lexical expression,!* but 


186 https://osf.io/vht2m/ 

187 As opposed to encoded by a morphological affix, a particle, or intonation. The concurrent 
claim is that such doxastic, epistemic or evidential attitudes to a proposition, if expressed by af- 
fixation, a particle, or intonation, can never be at-issue; a conclusion which is supported by Faller 
(2014) for the Cuzco Quechua reportative evidential and which also seems to square with our own 
Quechua data. 
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usually, they are not.!5? Ortega-Llebaria & Prieto (2007, 2011) study the acoustic cor- 
relates of stress and accent in Spanish (and Catalan), making use ofthe difference in 
intonation between what they call *declaratives", on the one hand, and *parenthe- 
ticals" (2007) or “reporting clauses" (2011), on the other. In their experimental sen- 
tences, the latter are instances of exactly such utterance-final evaluative adjuncts. 
Ortega-Llebaria & Prieto (2007, 2011) take it as a matter of course that what they 
call *parentheticals" or *reporting clauses" are deaccented, after a statement to that 
effect by Navarro Tomás (1968 [1944]: 115—116), where these clauses are defined as 
“la intercalación de un elemento incidental, con carácter propio, ajeno a la estruc- 
tura melódica de la frase en que se encuentra".!*? This is the definition that Ortega- 
Llebaria & Prieto (2011: 78) quote, notably a definition of which intonational form 
(namely, the melodic structure being different from that ofthe rest of the utterance) 
is already a part. While Ortega-Llebaria & Prieto (2007, 2011) state that such parent- 
heticals/reporting clauses are produced in a low monotone pitch without excursi- 
ons, Navarro Tomás (1968 [1944]: 116) actually points out that there are also cases 
of such parentheses that are not deaccented, but where instead accent and pronun- 
ciation are strengthened relative to the rest of the utterance, and that this serves 


188 Simply because the contents of beliefs or evaluations seem to be a more frequent topic of 
conversations than the nature of these beliefs or evaluations as opposed to their content, which is 
what needs to be the case for them to be at-issue. However, they can certainly be at-issue, which can 
be seen from the following examples (adapted from Roberts (2015: 47—48): 
(i) Context: Why hasn't Louise been coming to our meetings recently? 
a. Henry believes she left town. 
b. She's left town, Henry thinks. 
Possible replies: 
c. But she hasn't. I saw her at a supermarket yesterday. [targeting the content of the belief] 
d. No he doesn't. He told me he saw her at a supermarket yesterday. [targeting the belief state 
as opposed to the content] 


The fact that (i d) is perfectly fine as response to (i a, b) shows that these belief states can be made to 
be at-issue. Equally, the speaker uttering (i a) or (i b) should be able to make the belief state at-issue 
in their own alternative utterances (ii a, b), and this should be accompanied by more accentuation 
on the postposed verbal complex Henry thinks in (ii b) than in a prototypical utterance of (i b). 
(ii) Same context 

a. Henry thinks she left town, but usually what he says are just random guesses. 

b. She'sleft town, Henry thinks, but usually what he says are just random guesses. 


It seems to me that Spanish is in general no different in this regard, but this would have to be 
shown from controlled speaker judgments. 

189 “the insertion of an incidental element with its own character outside of the melodic structure 
of the phrase in which it is found", my translation. 
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“para subrayar expresiones especialmente intencionadas e importantes".!?? This 
suggests the possibility that such parentheticals can also occasionally be at-issue in 
Spanish, and that this might change their accentuation behaviour. It would be nice 
to unify these accounts and to analyze the absence of accentuation in utterance- 
final evaluative verb phrases like in QF16 MT ES 2251 and other examples like it 
in our data together with the *parentheticals" observed by Navarro Tomás (1968 
[1944]) and Ortega-Llebaria & Prieto (2007, 2011). However, this would require actu- 
ally showing that they can be accented when they are at-issue, for which there is no 
data so far (see footnote 187). 

Returning to our own data, these examples are curious because their contexts 
do not suggest the presence of a [REVERSE] feature, i.e. no difference in relative pola- 
rity between provocation and response, and there is no single identifiable oppo- 
site alternative salient in the context, against which the at-issue content is asserted 
(cf. Roelofsen & Farkas 2015: 385). And yet the deaccentuation in such cases even 
extends to entirely new information, as in KP04 Cuent ES 2317 (see Figure 72). 


Figure 72: KPO4 Cuent ES 2317"? (el nietecito le dio un besito en la naríz / ahí terminó el cuento ‘the 
little grandson gave him a peck on the nose / there the story ended’; declarative with deaccentuation 
after naríz). 


In this example, KP04 is finishing his re-telling of the story in Cuento (after TP03 
first told it to him). The contents of the story itself end with naríz, up to which every 
accentable word is accented. This is followed by a silent pause (nearly 1 s), and then 
the additional phrase ahí terminó el cuento follows, which is deaccented. Note that 
both examples of deaccentuation of *parentheticals" seen here are not responses 
but provocations themselves, so that there can't be a [REVERSE] feature present in 


190 “to highlight particularly intended and important expressions", my translation. 
191 https://osf.io/j2mpv/ 
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the context. KP04’s is an interesting example not only because the information con- 
tained in the deaccented part is new (albeit inferrable and somewhat formulaic), 
but also because the deaccented material is separated from the accented material 
by such a long break. This stands in quite a marked contrast to what happens in 
SG15 MT ES 2396 (Figure 73 with context (60)). 


(60) SG15 QF16 MT ES 2291-2416 (context for SG15 MT ES 2396)? 
General context: directly following (59). 


time | SG15 (the one with the path) QF16 (the one without the path) 
229.1 ya por encima de la olla tienes 
que bajar 
231.8 (6.4) 
238.2 yala quena ya no llego 


239.6 sítienes que llegar 
2416 tienes que bajar 


Figure 73: SG15 MT. ES 2396? (declarative with accentuation reset after llegar). 


Here, QF16 asks whether the path does not let him reach the landmark of the flute, 
using a confirmation-seeking question with negative polarity, y a la quena ya no 
llego (238.2). This is then the proposition on the table, with a confirmation bias for 
the negative polarity of the proposition (Farkas & Bruce 2010; Roelofsen & Farkas 
2015). At 239.6, SG15 reverses this bias with the polarity particle sí, which is pitch 
accented with very strong pitch excursion, followed by tienes que llegar, which is an 
at-issue addition and accented with less excursion, pitch falling globally through- 
out its realization. Nearly without a pause, this is then followed by tienes que bajar 


192 https://osf.io/9fkwu/ 
193 https://osf.io/xdz3q/ 
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(at 241.6), which is clearly independently accented and also answering a separate 
QUD, namely something like WHERE DO YOU HAVE TO GO? or possibly How Do vou 
GET THERE?. However, with regards to the explicit QUD of 239.6, Do I NOT GET TO THE 
FLUTE ANYMORE?, tienes que bajar is just as non-at-issue as ahí terminó el cuento is 
to the QUD that is answered by the first part in KPO4 Cuent ES 2317, in addition to 
having a much shorter break in between. Yet the former is not deaccented, while 
the latter is. Whether this is because such parenthetical phrases are just much more 
likely to be not at-issue, or because the difference simply lies in individual varia- 
tion, any analysis of deaccentuation using a division between at-issue and non-at- 
issue material must take into account the interaction between what is prosodically 
a separation between utterances (accompanied by pitch reset) and what constitutes 
a separate move in the conversational game, as well as the difference in relative 
polarity between a provocation and its response. 

Summarizing, deaccentuation (variably realized as severely reduced pitch 
scaling) has been found to occur in the Huari Spanish data in various contexts. 
While it seems nearly categorical in alternative questions and wh-questions, in 
declaratives its occurrence is quite variable. According to this qualitative investiga- 
tion, the presence of a [REVERSE] relation between the utterance and a provocation 
seems to be a far stronger contributing factor than only a contrast between new 
and given material. The sole presence of the latter was not found in examples with 
deaccentuation. Deaccentuation was also found outside of reversals/denials on 
material denoting evaluative additions or parentheticals. Deaccentuation does not 
categorically cooccur with any of these contexts, with individual speaker prefer- 
ence being a possible additional factor. While a thorough quantitative exploration 
of the contexts of deaccentuation remains a task for the future, prosodically it can 
be assumed that the necessary (but not sufficient) condition for deaccentuation is 
that the highest metrical position in a phrase at least of iP-size is not final in that 
phrase but followed by further accentable material. 


5.1.3.3 Tonal target placement and epistemic biases 

As a last item included in the intonational phenomena observed in simple Huari 
Spanish utterances, I will here briefly describe a shift in pitch accent peak align- 
ment and its possible pragmatic function. Because the examples are isolated, the 
discussion will have to remain exploratory. There are a number of utterances 
where the temporal alignment of both the final peak and its preceding elbow with 
regards to the stressed syllable is divergent from that in the majority of the Spanish 
data. This seems to correlate with contexts that suggest, in the broadest terms, an 
attitude of epistemic bias with regards to the proposition expressed on behalf of 
the speaker. 
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Figure 74: HA30 MT ES 1372"? (declarative with delayed peak on murciélago). 


(61) ZR29 HA30 MT ES 1197-1405 (context for Figure 74)? 

Global map context: the bat (murciélago) is at different positions in the two 
maps, so that HA30 cannot move between bat and pot at the point when ZR29 
instructs her to do so (cf. Figures 190 and 191 in Appendix B). 
time | ZR29 (with the path) HA30 (without the path) 
119.7 pasas 
120.7 por debajo del zorro 

you pass below the fox 
122.9 ya 
123.4 encima dela nube 

above the cloud 
125.4 por medio de 
128.8 elmurciélago y la olla vas pasar 

between the bat and the pot you'll go 
1322 murciélago y la olla 

bat and pot 

133.6 sí 
1341  porelmedio de esos dos pasas 
136.4 porencima 

yes between those two you pass above 
137.2 por ahí no veo murciélago 

I don't see a bat there 

139.9 pues acá hay 

well here it is there 


194 https://osf.io/dkawc/ 
195 https://osf.io/pu9jg/ 
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Figure 74 shows the utterance HA30 MT ES 1372, por ahí no veo murciélago. The 
proparoxytonic murciélago is utterance-final here, like in (40)/Figure 23, but there, 
the pitch elbow is in the pretonic and the accent peak is reached shortly after the 
start of the vowel in the stressed syllable. In contrast, here the peak occurs only at 
the end ofthe vowel, at the boundary to the posttonic syllable, and it is the preceding 
elbow!*6 which is formed right at the start of the vowel of the stressed syllable (after 
the heightened pitch due to consonantal microprosody in the fricative segment). The 
peak on murciélago also clearly has much more excursion than that on veo, and, 
more strikingly, both the word and its stressed syllable have much more duration 
relative to the preceding materialin the utterance. The context for Figure 74 is given 
in (61). In the global context, the bat (murciélago) is one of the landmarks whose 
position differs between the two maps; therefore, HA30 cannot move between the 
pot (olla) and the bat when ZR29 instructs her to. Sequentially, at 122.9, HA30 signals 
acceptance of ZR29's preceding instructions; this is understood by ZR29 who pro- 
gresses towards directing the way around the next landmark on the map, which 
includes that HA30's path should move between the bat and the pot (123.4-128.8). 
This is the first time the bat (murciélago) is mentioned. At 132.2, HA30 suspends 
acceptance of those directions by uttering a request for clarification which takes 
the form of a repetition of the landmarks she should pass through, murciélago y la 
olla (the bat and the pot). ZR29 understands this to be a request for clarification, she 
confirms that HA30 understood correctly (sí at 133.6) and gives a partial repetition 
of her previous instructions, but pronominalizing the landmark referents of bat and 
pot as esos dos (134.1—136.4). At 137.2 then, HA30 produces the utterance under dis- 
cussion here. Considering the preceding context, this utterance serves the purpose 
of continuing the suspension of acceptance of ZR29's current instruction. It clarifies 
the reason for the suspension: while ZR29 accedes to the request for suspension 
and clarification in 133.6-136.4, she must assume that what is at-issue is the path 
between the referents murciélago and olla, not the referents themselves, as she pro- 
nominalizes them, thus presupposing their existence in the relevant part of the map. 
This presupposition is however what is contested by HA30 in 137.2 for the landmark 
referent murciélago. Her turn targets an answer not to the current QUD, but to one 
(IS THERE A BAT?) that lies far back in the discourse history and which was treated as 


196 There is also another elbow at the boundary of the pretonic to the stressed syllable, and the 
elbow within the stressed syllable is located somewhat higher than this earlier one. It is not clear 
which of these really is the target for the L tone; however, if we were to draw a straight line from 
the earlier elbow to the later one, it would have a considerably lower slope than the rise that takes 
place within the stressed syllable itself, and also than a line drawn from where the voicing ends to 
where it starts again in the corresponding section in Figure 23. Therefore we can probably say that 
the elbow is relatively later in Figure 74 than in Figure 23. 
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settled (in CG) by both interlocutors previously. She thus brings it back on the table, 
making it at-issue once again and contesting the settlement. The referent of mur- 
ciélago is thus not saliently contrasted with another available referent, but instead a 
contrast is evoked between presupposing the referent of the expression murciélago 
(and its position on the map) and being unable to find a fitting referent for it, which 
is what HA30 expressly communicates here (por ahino veo murciélago meaning here 
something like “in the relevant part of the map, I cannot find a referent that would 
fit the description of a bat”). I suggest that this is achieved by including a [REVERSE] 
feature in the context update effected by this utterance, which specifically targets 
the presupposition that the bat is there, not the provocation, in an extension of how 
the [REVERSE] feature is used in Roelofsen & Farkas (2015). It seems plausible that 
this presupposition challenge, pointing to a discrepancy between the two speakers 
regarding the status of the referent of murciélago in the common ground, is cued by 
the marked prosodic realization of murciélago with delayed tonal alignment here. 


Figure 75: MD40 MT ES 1542"? (clarification question with delayed peak on perro, and declarative 
with normally aligned peak). 


(62) SO39 MD40 MT ES 1257-1580 (context for Figure 75)? 

Global map context (cf. Figures 190 and 191 in Appendix B): the positions of the 
lamb (corderito) are very different in the two maps: in SO39's map (with the 
path), it is in the lower half, whereas in MD40’s (without it) it is at the top of the 
upper half, above the fox (zorro), while the dog (perro) is in the lower half. The 
position of the lady (señora) is also different between their maps: in SO39’s it is 
at the top of the lower half on the right side, directly below the lamb, while in 
MD405s it is on the left upper side of the lower half. 


197 https://osf.io/evu35/ 
198 https://osf.io/94w2h/ 
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time $039 (with the path) MD40 (without the path) 
125.7 del perro me doy vuelta por la 
sefiora 


from the dog I turn around the lady 
128.5 mhm por encima por la mitad 
con el corderito lo es- que está 
por la mitad 
yes above through the middle 
with the little lamb- that is in 


the middle 
136.3 ahíno hay 
it's not there 
137.1 tienes de por el perro 
you gotta [say] via the dog 
141.6 del perro 
142.8 del perro pasa a la señora no 
from the dog it goes to the lady 
right 
146.0 por debajo por [en- 
below ab- 
1474 [por en]cima 
above 
148.2 mhm ya 
149.3 mhm ya pasar corderito por su 
debajo 
yeah passing the little lamb 
from below 
154.2 perro 
155.0 no este es el zorro 


dog no this is the fox 
158.8 (laughter) 


The next example of a delayed peak cueing an epistemic bias is MD40 MT ES 1542, 
perro no este es el zorro (Figure 75), with context (62). The relevant element with 
marked tonal target placement is the first one, perro. As in the preceding example, 
there is a late tonal target realization, with the low pitch elbow realizing the L tone 
produced just at the beginning of the vowel in the stressed syllable, and the peak 
realizing the H tone at the end of the vowel. This is followed by a return to low in the 
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posttonic and then a further rise at the end of it (visible only partially in Figure 75 
because of some creakiness in the voice, but indirectly evidenced by the initial fall 
from high in the next word, no, and also clearly perceptible auditorily). The fall-rise 
movement in the posttonic is here taken to evidence the presence of LH% bound- 
ary tones, which were found associated with polar questions (see section 5.1.2.1). 
A conventional orthographic transcription might be something like “;;Perro?! 
No, este es el zorro.* Note that in the segmentally similar zorro at the end of the 
example, the pitch peak is reached clearly earlier, just about after the midpoint of 
the stressed vowel, and with a considerable part of the following fall taking up its 
remainder (the auditory impression is also clearly very different). The context (62) 
helps interpret this example: globally, it has to be kept in mind that the lamb (cord- 
erito) is at the top of the lower half of S039's map, but at the top of the upper half in 
MD40’s. In both maps, the dog (perro) and the fox (zorro) are roughly in the middle 
of the lower and upper half, respectively. The lady (señora, actually an image of a 
cartoon figure with a bag of money) is in the upper right part of the lower half in 
SO39's map, but in the upper left part of it in MD40’s map. At 125.7, MD40 suggests 
a path for how to continue from the dog, which is confirmed with a confirmation 
token (mhm) by $039 who follows it up with a elaboration of instructions about 
how to move from there via the lamb at 128.5. MD40 then points out that the lamb 
is not there, and tells S039 to repeat the instructions using the dog as starting point 
(136.3—141.6). S039 obliges and asks to make sure that the immediate path from the 
dog to the lady is agreed upon (142.8). They then discuss whether to go above or 
below, and having settled this, S039 continues the instructions on how to proceed, 
passing below the lamb (146-149.3). At this point, MD40 then produces the utter- 
ance under discussion. The perro-part, with severely delayed peak, here resembles 
the move performed by step b in the confirmation-seeking question sequence dis- 
cussed in section 5.1.2.2. It is also a clarification request (one for intended content 
according to Ginzburg 2012: 149-150; Lupkowski & Ginzburg 2016: 250—251), but 
unlike the examples discussed there, it is not biased for confirmation, but for dis- 
confirmation (an “incredulity question”): as the elaboration shows, MD40 intends 
to correct the identity of the referent they have been referring to as perro, com- 
mitting to the proposition that the relevant referent is instead correctly labeled as 
zorro, the skunk (the instructions as given for her only make sense if they refer to 
the skunk). The peak on zorro in this assertion is not delayed, as expected. Thus 
we could again say that the context update conveyed by the utterance contains a 
[REVERSE] feature, which does not target a proposition on the table but one that is 
presupposed, namely that there is a dog there at the relevant location or that the 
entity there is correctly referred to as a dog. 
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Figure 76: TP03 MT ES 1312'? (declarative with deaccentuation after and delayed peak on arriba). 


(63) TP03 KP04 0906-1050 and 1211-1371 (context for TP03 MT ES 1312)? 
General context (cf. Figures 190 and 191 in Appendix B): the lamb, the millionaire 
(nifio millionario), the rock, the bat and the dungheap are in different locations 
in the two maps. They already had a conflict, because KP04 would have needed 
to do a complete circle (círculo) around the lamb back to the millionaire in 
order to follow TPOS's instructions and objected. TPO3 describes how to 
proceed from under the dog (debajo del perro). In the omitted part, they have 
a similar conflict about having to do a complete circle. TP03 then restarts by 
explaining how to move from the dog. 


time 
90.6 


93.1 


93.6 


94.8 


97.0 


98.7 


100.7 


TP03 (with the path) 
pasa por este 
goes by that 


ajá por los pies 

yeah the feet 

ya ahí detente un rato 

right stay there a while 

y vair 

and it'll go 

de lo que están ahí este 

from the one they're there that 
de lo que están enterrando 
from the one they're burying 


199 https://osf.i0/8gk74/ 
200 https://osf.io/cgk7u/ 


KP04 (without the path) 


por los pies 
the feet 


103.7 


104.7 


105.0-121.1 


121.1 


123.1 


125.0 
125.6 


128.0 
128.6 
130.0 


133.8 


135.5 
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de lo que estan enterrando 
from the one they’re burying 
aja 
yeah 
[...] 
ya por debajo del perro ha 
pasado 
right below the dog it's passed 
si por encima del niño 
yes above the boy 
ya 
por encima del nifio está 
pasando no te vas a ningán 
círculo 
right it's going above the boy 
don't go in any circle 
ya 
ya 
sin ce- sin cerrar nomás vas 
hasta arriba hasta donde que 
están enterrando 
with- without closing [the 
circle] you go upwards up to 


where they're burying 
ah ya paso por encima nomás 
ah right I just go up 

ajá pasas por encima nomás pe 

yeah right you just go up 


The last example is the most complicated one. In (63), TP03 first mentions the land- 
mark he calls lo que están enterrando (“that which they are burying", an image of 
a funeral) at 98.7-100.7, which is supposed to serve as departure point for his next 
instruction. This causes some confusion for KP04, who repeats that he then has to 
to do a full circle (in the omitted section). Doing a circle around the millionaire at 
this point means moving downwards on the map for TP03, for whom the current 
position on the path is directly above the millionaire (encima del nifio) at 125.6. 
TP03 explicitly rejects going full circle in 125.6. In 131.2 (Figure 76), he then gives 
the proposition that corrects not only the circle path, but also what this implies for 
him, namely moving downwards. Since going full circle entails going downwards 
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but not the other way around (from TP03’s point of view at this moment), the cor- 
responding QUD DOES THE PATH GO DOWNWARD? is superordinate to the one asking 
DOES THE PATH DO A FULL CIRCLE AROUND THE MILLIONAIRE? (cf. Roberts 2012b). TP03 
must assume that KP04 presupposes that the path goes down here, and that this is 
the cause of their misunderstanding.**! When uttering 131.2, he corrects this pre- 
supposition (the proposition that the path goes downwards which he believes KP04 
to believe to be in the common ground), i.e. his move contains a [REVERSE] feature 
targeting it and it also asserts its complement, to go upwards. To this, going to 
where they are burying is a non-at-issue addendum, and deaccented as expectable 
from the preceding section. I argue that the delayed peak on arriba here cues the 
presence of a [REVERSE] feature in the context update targeting not the proposition 
on the table but a presupposition, as in the previous two examples. 

In sum, I tentatively propose that what unites these three examples with 
delayed peaks is the presence of a [REVERSE] feature targeting a presupposition 
(a previous QUD), instead of the proposition on the table (the current QUD). Note 
that we dealt here with both assertions and a clarification request, suggesting this 
is some kind of (modal) non-at-issue meaning??? component that is orthogonal to 
the difference between these speech acts. The clarification request by MD40 seems 
almost the direct opposite to those discussed in section 5.1.2.2 in terms of bias, 
while for assertions the difference is more complex.”°? However, formally the sim- 
ilarity between biased questions and assertions discussed there seems to be main- 
tained here too, with the cue for the additional meaning being a delayed peak in 


201 Previously at this location, KP04 had always made a circle here instead of doing what TP03 in- 
structed him to do. Because the circle around the lamb from his position at this point could only be 
made by going downwards, he must assume that KP04 presupposes that the path goes downwards 
there (in reality, the relevant objects on their maps are differently placed). 

202 In the form perhaps of a conventional implicature (cf. Bianchi et al. 2016; Fliessbach 2023), a 
presupposed modal meaning (Reich 2018), or an illocutionary operator (Faller 2014). 

203 Assertions put a proposition p on the table and project its acceptance, while neutral polar 
questions of p put (p, =p} on the table and only project the acceptance of either outcome (Farkas 
& Bruce 2010: 92, 95). Biased assertions and questions pose a more difficult case. Fliessbach (2023: 
67—68) differentiates bias and commitment based on Farkas & Roelofsen (2017), and argues that 
both can be reacted to in disourse. However, he treats bias as an gradable but absolute value, while 
Iwould argue that the polarity of bias is also relevant, since questions with a bias for confirmation 
and ones with a bias for disconfirmation seem to have different forms. It is an open question how 
that would have to be modeled, and whether something like negative bias perhaps emerges out 
of a meaning like the one I propose. As argued above, the assertions and questions with negative 
bias discussed here should be seen as conveying an additional non-at-issue meaning component 
consisting at least of a [REVERSE] feature targeting not (only) the proposition on the table (at least 
TP03's utterance should be seen as a provocation, not a response, anyway), but a presupposition 
for a salient proposition in the context that is complementary to the proposition that is asserted. 
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both cases. The contrast conveyed here between a presupposition and the proposed 
context update is clearly different from that between two alternative referents or 
propositions (that obtains in some of the deaccentuation cases from the previous 
section), suggesting that a label of “contrastive focus” would be underspecifying 
in either case. There are further similar examples in the Huari Spanish data, but 
I suggest that the phenomenon should be done better justice by treating it as the 
main focus of separate further research. 


5.1.4 Interim summary 


Before moving on to the analysis of more complex utterances in the next section, I 
review the findings made in this section. I have described the intonation of Huari 
Spanish declaratives and interrogatives. In general, nearly every accentable syl- 
lable has been found to be pitch accented with an LH*, whose peak is regularly 
aligned within the stressed syllable, independent of position in a phrase (speech 
with these attributes has here been called the main accentuation variant). The ana- 
lysis has revealed that there is comparatively little paradigmatic variability in tone 
choice. The bitonal LH* is possibly the only attested pitch accent, and equally, only 
three boundary tones (L-/%, H-/%, LH-/%) could be identified. Only the delayed peak 
found in the contexts discussed in section 5.1.3.3 might be analysed as a separate 
pitch accent, L*H or perhaps L*«H*, but it is still clearly relatable to LH*. On the 
other hand, however, two variant phenomena of pitch accentuation that cue infor- 
mation structural and discourse-pragmatic meanings were found, phrase accentu- 
ation and deaccentuation. Both of these accentuation modes are characterized by a 
marked syntagmatic contrast between accentuation of accentable positions in dif- 
ferent parts of an utterance, with gradual steps between weaker and stronger tonal 
compression or even deletion. This also highlights the relevance of pitch scaling 
for intonational description. Their occurrence was found to be variably conditi- 
oned by discourse context, but is also subject to individual speaker preference, with 
one speaker showing signs of using the phrase accentuation even as a default. The 
phrase accentuation effectively constitutes a more phrase-optimizing intonation 
than the main accentuation or other familiar intonational systems of varieties of 
Spanish, which can be said to optimize prosodic words instead. In the final section, 
it was suggested that variability in peak alignment on stressed syllables might 
also be associated with differences in pragmatic meaning. In the next section, the 
results from this section will be built on in the discussion of more complex double 
topic-utterances, and integrated into a detailed analysis ofthe levels of the prosodic 
hierarchy that are evidenced in them. It will be shown that pitch scaling again plays 
a decisive role in signaling a prosodic structure with nonlocal dependencies. 
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5.2 Complex utterances: “Double topic”-constructions 
and hierarchical tonal scaling structure 


This section expands the analysis of Huari Spanish to a specific type of complex 
utterances, the double topic-utterances from the Elqud experimental task. Their 
analysis will demonstrate that pitch scaling at the level of register heights is sensi- 
tive to a hierarchical prosodic structure that is recursive. In a second part, utteran- 
ces of this type displaying the type of variation called “phrase accentuation" in the 
previous section will be considered separately. 


5.2.1 Data 


The utterances discussed here come from Elqud items 14, 26, 43, 52, 59, 65, and mar- 
ginally 44. As described in section 2.4, Elqud is a corpus from an elicitation setup 
in which participants were asked to utter corrections to recorded utterances (the 
audio stimulus) as if the speaker of the audio stimuli were present in the conversa- 
tion, based on differences between what the audio stimulus asserted and a simulta- 
neously shown image or short animation (the visual stimulus). Although speakers 
were encouraged in general to respond in detail and expansively, using rather more 
than fewer words, no restrictions were enforced during the experiment on Aow the 
speakers should respond in each case. This led to some items by some speakers con- 
sisting only of generic objections such as no es correcto ‘that’s not correct’, es falso 
‘that’s wrong’ or no, es al revés ‘no, it's the other way round’, which are not useful 
for the analysis, but it also means that when speakers did respond completely, they 
did so in a way that was presumably more natural to them than if they had been 
told to respond observing a certain syntactic pattern, certain words, or suchlike. 
Such responses that were successful in terms of the experimental aim will be called 
‘full responses’ in the following. Some responses also had to be excluded from the 
analysis because even though speakers did attempt to give expansive responses, 
they noticeably got confused, mixed up parts of their utterance or broke off after an 
incomplete attempt. This is a normal feature of uncontrolled speech under any con- 
ditions, and it happened with all speakers occasionally. In those cases, during the 
experiment, no attempt was made by the experimenters to ask them for another 
try, instead, they simply proceeded to the next item. Items 14, 26, 43, 52, 59, 65 and 
44 are presented in Table 11 by giving the text of the recorded audio stimulus and 
a description of what differed in the visual stimulus from what the subjects heard. 

These examples (both the recorded stimuli and the ‘full’ responses) have in 
common that they consist of two different predications made about two different 
referents. In terms of the QUD-model of discourse (cf. Roberts 2012b), they can be 
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Table 11: Stimuli for the Spanish ELQUD items 14,26, 43, 52, 59, 65, 44. Pitch accented syllables are 
underlined, words standing in a contrast relation with others in the utterance are capitalized. 


Item Text of audio stimulus Description of visual stimulus / 

number mismatch 

14 El HOMBRE está COMIENDO y la MUJER The man is sleeping while the woman 
DURMIENDO is eating 
‘The man is eating and the woman is sleeping’ 

26 EI PERRO está DEBAJO de la roca y el GATO está The dog is above the rock and the cat 
ENCIMA de la roca is below the rock 
“The dog is below the rock and the cat is above 
the rock 

43 EI colibri ROJO está chupando la flor AMARILLA yel The red hummingbird is drinking 
colibrí VERDE la flor ROJA from the red flower, the green 
‘The red hummingbird is drinking from the hummingbird is drinking from the 
yellow flower and the green hummingbird from yellow flower’ 
the red flower’ 

52 El colibri VERDE está chupando la flor que estáala ` The green hummingbird is drinking 
IZQUIERDA el colibri AZUL está chupando la flor que ` from the flower on the right, the blue 
está a la DERECHA hummingbird is drinking from the 
“The green hummingpbird is drinking from the flower on the left 
flower that is on the left, the blue hummingbird 
is drinking from the flower that is on the right’ 

59 El zorrillo PEQUEÑO está DETRÁS de la casa The small skunk is in front of the large 
PEQUEÑA, y el zorrillo GRANDE está al FRENTE dela ^ house, the large skunk is behind the 
casa GRANDE small house 
“The small skunk is behind the small house and 
the large skunk is in front of the large house’ 

65 El zorrillo GRANDE está al frente de la casa The large skunk is in front of the small 
GRANDE, y el zorrillo PEQUENO está al frente de la house, the small skunk is in front of 
casa PEQUENA the large house 
‘The large skunk is in front of the large house 
and the small skunk is in front of the small 
house’ 

44 El perro NEGRO está jugando con Ia pelota ROJA y No mismatch / filler 


el perro BLANCO está jugando con la pelota AZUL 
‘The black dog is playing with the red ball and 
the white dog is playing with the blue ball’ 


understood as providing an answer to a superordinate QUD asking about what pro- 
positions hold of a set of referents, e.g. {the people in the picture}, such as ‘what 
are the people in the picture doing?’ by providing answers to subordinate QUDs 
that are formed by asking the superordinate question about each member of the 
set of referents individually, i.e. {the man, the woman}, such as the man is sleeping 
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and the woman is eating. For English, the way in which this kind of information 
structural constellation is reflected in prosody has been discussed in terms of what 
has been called “A-accent” and “B-accent” since Jackendoff (1972) (cf. also Büring 
2003; Ladd 2008; Roberts 2012b). We can also say that the referents about which the 
propositions are predicated are sentence topics in the sense described by Roberts 
(2011) as that of the entity in a sentence to which our attention is first drawn and 
about which we are then told something, a referential restriction upon the domain 
over which the proposition is to hold.?% Complementarily, what is predicated of 
them will be called comments. In the corpus of utterances discussed in this section, 
the set of entities about which propositions are predicated always consists of two 
members (the man and the woman, the small skunk and the large skunk, etc.). They 
will therefore be called *double topic"-constructions. Since the topics are pairs (as 
members of the set of referents about which the superordinate QUD is asked) and 
the propositions that are predicated of them are also parallel to each other, there 
is a notion of contrast both between the two topics of each utterance and the two 
comments. This contrast manifests in at least one element that is different between 
the first and the second topic and comment, respectively. These contrasting ele- 
ments are highlighted by capitalization in Table 11. A further notion of contrast 
comes into play when considering the discursive relation between the utterances 
produced by the experimental subjects and those that serve as the audio stimuli. In 
the experimental items, the utterances assert that a different state of affairs obtains 
between the members of the set of referents and the predications made about them 
than that which is asserted by the audio stimuli (the referents and the predications 
themselves are the same, nothing new is added in the corrections, but their relation 
is asserted to be different). They are thus partial denials in terms of Farkas & Bruce 
(2010). The way they are interpreted here, as reflecting a complex QUD structure 
that answers a question with two variables (WHO IS DOING WHAT?), the comments 
are at-issue with respect to the current QUD for each utterance half (TOPIC 1 Is DOING 
WHAT?, TOPIC 2 IS DOING WHAT?), while the topics are the answers to the question 
WHICH ARE THE TRUE MEMBERS OF THE SET OF AGENTS?, which is needed to answer 
the superordinate QUD (cf. Figure 77). Syntactically, the utterances consist of a con- 
junction of two sentences, the conjunction element either expressed by y ‘and’ or 
not (this is true both for the recorded stimuli and for the utterances produced by 
the speakers). The topics are realized as noun phrases consisting either of one (in 
the case of items 14 and 26) or two content words (a noun followed by an adjective, 


204 The other sense of „topic“ that Roberts (2011) discusses is of course that of the ‘discourse topic’, 
what a larger portion of discourse is about. She herself identifies this more global sense of topic 
with the QUD (Roberts 2011: 1909), so that ‘discourse topics’ and “sentence topics’ in this conception 
can be equated with superordinate and subordinate, or more global and more local, QUDs. 
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in 43, 44, 52, 59, 65, with the adjective, never the noun, contrastive). The comments 
are realized as VPs of various complexity, containing several content words: while 
in 14, they consist just of the auxiliary está and the verb in the present participle, in 
all other examples they contain at least a further NP that is either a direct object of 
a transitive verb (in 43, 52), a prepositional object of a locative predicate (in 26, 59, 
65) or an oblique prepositional object of a verb (in 44). In item 52, this NP is itself 
complex, containing a locative relative clause. 


Visual stimulus ! Audio stimulus | Speaker response 


(„true” state of affairs) (. false” or , mixed-up" state of affairs) | (, correction" / reversal) 
: provocation : response 


„external“ contrast 
[Reverse] 


topic 1 === comment 1 
el colibri ROJO ta flor ROJA 


Y V i topici comment 2 
wm (MOIA) — | el colibri ROJO „ KNCRAOM — s 
E 
uS. 2 
à h 
(esté chupando) S£ 
fw 


lestá chupando) 


1 F y ! 
nri VERDE) 7 flor AMARTA):  elcolibri VERDE e la fior AMARILLA | øl colibri VERDE la fior AMARILLA 


(between subordinate QUDs) 
„internal contrast 2 (between 


answers to subordinate QUDs) 


topic 2 se Comment 2 


* a Ý + topic2 commenti | 


,external" contrast 
[Reverse] 


superordinate QUD: Who, ís drinking what,? 
context-given sets of referents: [colibri rojo, colibri verde], (flor raja, flor amarilla), 


subordinate QUD 1; WHAT ABOUT THE RED HUMMINGBIRD, WHAT, IS IT DRINKING FROM? 
subordinate QUD 2: WHAT ABOUT THE GREEN HUMMINGBIRD, WHAT, IS IT DRINKING FROM? 


at-issue: {contrastive elements), in comment 1 & comment 2 


Figure 77: Schematic representation of information structural relations between the speaker 
responses and their stimuli and within the speaker responses themselves. 


In the discussion of the actual utterances by the speakers, I will refer to the first 
element of the conjoined sentences, containing topic 1 and comment 1, as simply 
“the first part", and the second “the second part". Since speakers were free in how 
to specifically realize their responses to the stimuli, they sometimes produced 
utterances in which their first topic was the topic of the second part in the audio 
stimulus and vice versa. Here we will not speculate about whether this reflects 
different perceived complex information structures of the utterances (as in *no, 
the red flower is what the green hummingbird drinks from, not the yellow flower" 
vs. “no, it is the green hummingbird that drinks from the blue flower, not the blue 
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hummingbird"). However, it is important to note that some of the stimuli allow 
several interpretations with respect to their implied information structure (defin- 
able as implicit QUDs), and consequently, the utterances produced by the speakers 
might also reflect different information structures. The most important one will 
here be briefly discussed in simple and largely atheoretical terms: item 59 might be 
interpreted as having either two or three locations of mismatch/contrast that could 
be realized in the response (here given by numbered bracketing): 


(64) a. [The small skunk], is [in front of]; [the large house];, [the large skunk], is 
[behind]; [the small house]; 
b. [The small skunk], is [in front of the large house],, [the large skunk], is 
[behind the small house]; 


Utterance 65, on the other hand, does not allow for this contrast on the preposition 
in the comments, because the preposition is the same in both comments. 

The audio stimuli and the full responses differ with regards to their length and 
the number of words on which we can expect pitch accents to occur. The default 
assumption here will be to expect a pitch accent on the lexically accented syllable 
of every accentable word, in accordance with the results of section 5.1.1.2. In Table 
11, all accented syllables in the audio stimulus are underlined. From that we can 
see that item 14 has 5, 26 has 8, item 43 has 10, items 44, 59 and 65 each have 12, and 
item 52 has 14 accented syllables. In the responses by the speakers, we can similarly 
expect an increasing number of accentable (and accented) words across the items 
in this order, but due to the relatively free experimental setup not necessarily the 
exact same number as in the stimuli. In the following we will investigate system- 
atic patterns in pitch accentuation and scaling in the utterances produced by the 
Speakers as a reflection of differing degrees of complexity in the prosodic structure, 
which in turn can be correlated with information structure. 

I will use the analysis of these double topic-utterances to provide evidence 
for the hypothesis that pitch scaling in these data reflects a hierarchical prosodic 
structure that is likely recursive, similar to what has been shown for English (Ladd 
1988) and German (Féry & Truckenbrodt 2005; Truckenbrodt & Féry 2015) as well 
as other languages, as discussed in section 3.6. 

In sections 5.2.2 and 5.2.3, I will establish the general pattern that the majority 
of these utterances follows via both quantification and analysis of individual exam- 
ples. I argue that this majority represents the *main" variant of complex utterance 
accentuation in the Huari Spanish data, in parallel to what has been established in 
section 5.1 for simple utterances. In sections 5.2.2—5.2.5 we will then consider devi- 
ations from this pattern, especially those that run in parallel to the *phrase accentu- 
ation" variant also already observed in section 5.1. In the remainder of this section, 
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I will discuss some more general observations characterizing “full” responses by 
the speakers. They differ from speaker to speaker and example to example in the 
degree in which they are *reduced", both syntactically and prosodically. Syntacti- 
cally, an unreduced utterance would be one in which both parts are realized as full 
sentences, i.e. without ellipsis. Unreduced utterances are the majority here. Also 
frequent is ellipsis of given elements, with two extreme examples of such reduction 
being item 43 by LJ22 and TP03: 


(65) LJ22 ELQUD ES 43 (cf. Figure 98) 
el colibrí verde flor amarilla y el colibrí rojo flor roja 
“the green hummingbird yellow flower and the red hummingbird red flower 


(66) TP03 ELQUD ES 43 
el colibrí verde está chupando la flor amarilla y rojo rojo 
‘the green hummingbird is drinking from the yellow flower and red red’ 


While LJ22 does not produce any verbal elements in his utterance, presumably 
because they are already given in the audio stimulus, TP03 eliminates all elements 
in the second part that would be given from their occurrence in the first part, i.e. 
all elements that are not contrastive. Ellipsis can therefore occur on elements that 
are given through the (external) relation to the provocation (the audio stimulus), 
or through the (internal) relation to preceding parts of the utterance itself. Contras- 
tive elements were preserved in most of these cases. Utterances where they were 
elided were not counted as full responses and excluded from the analysis. Ellipsis 
was observed to differ both across speakers and utterances, with TP03 and LJ22 
and item 43 most prone to it, and e.g. speaker OZ14 and item 65 at the other end of 
the scale. 

Prosodically, reduction here is above all a phenomenon affecting absolute pitch 
range within an utterance. This is largely speaker-dependent: some, as for example 
ZE55, employ a large range, others, such as ZZ24, seem almost to make an effort to 
reduce their pitch range as much as possible (see Table 12). 

For that reason, in the quantitative analysis I will employ a transformed pitch 
measurement that relates an individual measured value to the maximum and 
minimum measured value in the utterance. The results will show that even though 
pitch range differs so much between speakers, overall this does not affect relative 
pitch height, i.e. the tonal scaling relationship between individual elements within 
each utterance. Besides a relatively small pitch range, the utterances by the speak- 
ers who exhibit it are also characterized by being quite reductionist in other ways: 
for LJ22, several assimilative processes take place in that for instance, unstressed 
vowels are often centralized towards schwa or even the whole syllable is elided, 


218 —— 5 Huari Spanish 


Table 12: Mean pitch range in the double topic-utterances of 
ELQUD_ES according to speaker. 


Speaker Mean pitch Pitch range (stfrom N (utt.) per 
range (Hz) mean minimum)?” speaker 


0714 50.8 7 6 
QP44 43 6.4 5 
QZ13 59.8 7 6 
SG15 87.8 7.6 3 
TP03 44.4 6.5 3 
Xj45 63.3 8.8 5 
2224 31.2 4.2 6 
ZE55 141.5 11.4 5 
NQ01 117.9 11.3 7 
LJ22 46.9 6.9 5 


nearly all consonants are produced voiced while some voiced ones are turned 
into approximants, and complex onsets are reduced, yielding something like e.g. 
[gulə'wi:] for colibrí in LJ22_ ELQUD_ES_43, [bə'gě:ju] for pequeño in LJ22_ELQUD_ 
ES_59. ZZ24 or TP03 mostly maintain unvoiced consonants but also frequently 
reduce unstressed syllables. Probably such “reduced” speech has a socioindexical 
component apart from perhaps expressing some boredom at the experimental 
task: note that all speakers having a mean range of below 7 st are male. Different 
local varieties of a language do seem to differ in the mean pitch excursion size they 
employ for the same tonal categories (e.g., Liverpool speakers of English use a very 
small pitch range compared to speakers of other British varieties for the encoding 
of largely similar tonal inventories, cf. Nance et al. 2018), so it is likely also socially 
meaningful below the level of regional varieties. 


5.2.2 Analysis of individual examples / “main” variant 


QZ13_ELQUD_ES_52 (Figure 78) is a good example to observe the hierarchical 
scaling relation found in the double topic-utterances. Experimental item 52 is most 
complex both in terms of number of accentable words and internal syntactic struc- 
ture, as seen from Table 11. 


205 Range in semitones was obtained by taking the bottom value as the one from which the differ- 
ence to the top one is calculated using the f2st function from the hqmisc package in R (Quené 2014). 
The formula for conversion to st from two f0 values a and b in Hz is 12*(log2(a/b)), where b is the 
starting point from which the difference to a is to be calculated. See also Traunmüller (2005, 2017). 
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Figure 78: QZ13_ELQUD_ES_52?% (el colibri verde está chupando la flor de la derecha y el colibri azul 
al^" flor de la izquierda ‘the green hummingbird is drinking from the flower to the right and the blue 
hummingbird from the flower to the left"). 


From Figure 78, it becomes clear that pitch height on each accented word is neither 
the same nor simply declining steadily, but instead seems to follow a more complex 
pattern. On the topics of both parts, the pitch accents on the noun and the following 
adjective are almost the same height or the second is slightly higher, no downstep 
takes place between them. Relative to the pitch height of the topics, the comments 
are downstepped: all pitch accents are scaled clearly lower than those of the topics. 
Within the first comment, there is further differentiation: in the verbal complex, 
the pitch accent on the auxiliary está, if accented at all, is scaled lower than on 
the verb chupando. The pitch accent on flor is downstepped relative to that on 
chupando, but it is very clearly realized nonetheless. The last pitch accent in the 
comment, on derecha, is upstepped relative to the preceding one: it reaches about 
the same height as the one on chupando again. The same overall scaling relation 
can be observed in the second part, although the verb here is not realized; but 
the topic is scaled higher than the comment overall, and the last element of the 
comment, izquierda, is not downstepped, but, if anything, upstepped. Literally on 
top of all this comes the scaling relation between the two parts of the utterance: 
part two is scaled to a lower pitch height, overall, than part one, while preserving 
the internal scaling relations that can also be observed in part one. 

Figure 79 illustrates the downstep relations that obtain between the prosodic 
constituents of QZ13 ELQUD ES 52 as described, via coloured reference lines: 


206 https://osf.io/65yzm/ 

207 Similar occurrences of lacking gender and/or number agreement are frequent in the Huari 
Spanish data. This has often been attributed to Quechua contact influence (e.g. in Escobar 2000, 
2011) and is one of the most frequently cited features of what has been labeled *Andean Spanish" 
(Andrade Ciudad 2021: 125). 
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Figure 79: QZ13 ELQUD ES 52 with reference lines added. 


Figure 79 uses reference lines for register height to visualize the downstepping 
relations between the prosodic constituents in the example, a concept introduced 
in van den Berg et al. (1992) and developed further in Féry & Truckenbrodt (2005) 
and Truckenbrodt & Féry (2015). The blue reference line indicates overall H tone 
reference height for part one vs part two, the green line that for topic 1 vs comment 
1, the yellow line that for the downstep relation between the main subcomponents 
of comment 1, the verbal complex and the object, and the red line that for topic 2 vs 
comment 2. The reference lines show very well that a downstep-within-downstep 
relation obtains here between the prosodic constituents; such a conception exp- 
lains the partial reset that takes place on the second topic, where the pitch accents 
reach a height again that had previously already been passed below by some of 
the pitch accent in the first comment. A model using only global declination as a 
time-dependent effect taking place throughout the utterance or predicting pitch 
height of one pitch accent from the height of the preceding accent in an exponen- 
tial model, the latter of which was found to best describe downstep between pitch 
accents in Mexican Spanish by Prieto et al. (1996), would be hard put to explain 
the downstep relations found here. Note that the reference lines describe only the 
downstep relations between prosodic constituents larger than the prosodic word; 
there is more than one prosodic word (deducible from the presence of pitch accents 
on accentable words) in many of the smallest parts defined by the lines. Conside- 
ring this provides a way of reconciling the results in Prieto et al. (1996) with what 
is found here: in that study, only downstep within single NPs encompassing two 
to five accentable words bordering on the nonsensical and recorded in a reading 
task was investigated, so that no complex prosodic structure was assumed or pre- 
dicted that could interact with scaling. Their results therefore are really applicable 
to downstep within one domain, and provide evidence that within that domain in 
Mexican Spanish, pitch decay on peaks does follow an exponential model, just as 
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it has been shown to do in English (Liberman & Pierrehumbert 1984).2% Coming 
back to the example at hand, this consideration opens an alternative analysis for 
the downstep described by the yellow reference line in the figure: we might also 
suppose that the downstep between chupando and flor is just downstep between 
two prosodic words, and not reflective of a downstep relation between larger con- 
stituents. At this point, we cannot really decide this issue, but it will be taken up 
again when incorporating upstep into the discussion: it might be more parsimoni- 
ous to allow upstep only on nuclear pitch accents, i.e. those final in at least an iP/ 
PhP (cf. section 3.4.4), instead of just any prosodic word; this would then favour the 
partition as described here by the yellow reference line, with both está chupando 
and la flor de la derecha forming separate constituents. 

Regardless of how this issue is decided, it should be noted that the regularity 
in these downstep relations make a compelling argument against a fixed, non-re- 
cursive prosodic hierarchy: taking prosodic words as the domain at which exactly 
one pitch accent is assigned, and even leaving the downstep relation described by 
the yellow line aside, the downstep relations as described by the other lines force 
us to assign a phonological or intermediate phrase to the topic and comment each, 
whose downstep relation to each other are described by the green and red lines, 
and the whole extent of those lines, i.e. part one and part two each, will then have 
to be an intonational phrase. However, as described by the blue line, there is a 
downstep relation between those two, which could not be the case if the non-recur- 
sive IP was really the maximal prosodic constituent: it would then be the maximal 
domain of all phonological rule application and hence no phenomenon in one IP 
could make reference to a preceding (or upcoming) IP, as would here be the case. 
If we take the yellow line to describe a downstep relation between prosodic cons- 
tituents larger than the prosodic word contained within a larger unit comprising 
the entire comment and then combining with the topic, the same problem arises 
already a step earlier in the rise through the constituents from bottom to top. These 
two alternatives, in the order they have been discussed, and a third, are schemati- 
zed in (67)a-c. 


208 The reading stimuli for the Prieto et al. (1996) study expanded the noun phrase rayo de luna 
‘ray of moonlight’ (two accents) to the maximal (and ambiguous in terms of parsing) rayo de la luna 
de mi mayo de la gala de la Lola ‘Lola’s gala’s moonlight ray of my May” (five accents), while in the 
second experiment of Liberman & Pierrehumbert (1984), the stimuli consisted of lists with two 
to five names of berries, i.e. e.g. blueberries and raspberries, or blueberries bayberries raspberries 
mulberries and brambleberries. 
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(67) Three different prosodic structures for QZ13 ELQUD ES 52 
a. 


? 
ge 8 
= Æ 
ip ip 
ip ip ip ‘ip 
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\ — — / M S ^ | — | 


BAA DAA li A A À a 


el colibrí verde está chupando la flor de la derecha y el colibrí azul al flor de la izquierda 


b. 
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Ip ip ip ip ip 
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Á A A VE wwe \ 


el colibrí verde está chupando la flor de la derecha y el colibrí azul al flor de la izquierda 
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IP iP 
207 = É | = p = ið: ` 
ip ip ; ip ip ip 
\ E D "i Pd | -a — i 
hm ae | | eee | 


Á A Á VE eS Á Á \ 


el colibrí verde está chupando la flor de la derecha y el colibrí azul al flor de la izquierda 
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(67)c is in a sense a compromise between (67)a and (67)b: it preserves the separate 
prosodic constituency of está chupando and la flor de la derecho, respectively, of 
the latter, but it does not add an intermediate prosodic level corresponding to the 
comment of part one. Assuming that some kind of downstep rule applies between 
all prosodic sisters under one node sequentially from left to right and that n-ary 
instead of just binary branching is allowed (as in the proposal for German in 
Féry & Truckenbrodt 2005: 234-236), then (67)c would just as well account for the 
observed downstep relation observed in the utterance as (67)a and (67)b. All three 
alternative structures are problematic for another reason: when they are assumed, 
intermediate phrases, just like intonation phrases, are usually taken to end on a 
boundary tone. In Figure 78, pitch movement relatable to boundary tones is only 
really observable at the boundary between part one and part two (a high boundary 
tone) and at the end of the entire utterance (a low boundary tone), but probably 
not iP-finally where the structures in (67) only assume intermediate phrases to end, 
e.g. at the end of the two topics. In principle, at least two options are available for 
explaining this discrepancy that preserve the concept that intermediate phrases 
have boundary tones: 
1) There are boundary tones present at the right edges of all of the ips, but they 
are not as unambiguously identifiable because 
a. theyoccuratalower level of embedding, which somehow makes them less 
conspicuous, and/or 
b. atthe level of phonological representation, they are ambiguous between 
boundary tones and pitch accents (association to a prosodic edge or accen- 
ted syllable), and since they would occur here in such close proximity 
to an already existing pitch accent on a lexically accented syllable, they 
secondarily associate with that syllable (on precisely this phenomenon in 
Eastern European languages, see Grice et al. (2000); Ladd (2008)), making 
them seem part of the pitch accent in the phonetic implementation, and 
perhaps, they are more likely to do so when the edge they would normally 
associate with is lower-level or more deeply embedded than when it is 
high-level or less embedded (which is why they surface between part one 
and part two); or 
2) There are no boundary tones present at the right edges of these prosodic cons- 
tituents, and they have been wrongly identified as intermediate phrases. 


In the discussion of double topic-utterances exemplifying a type of *phrase accentu- 
ation" (section 5.2.4), we will see utterances where boundary tones are more clearly 
present also in some of the places where (67) takes iPs to end in. 

An alternative explanation for the scaling difference in the pitch accents, 
namely that it is simply due to whether or not a pitch accent is adjacent to a 
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boundary tone, with high boundaries causing pitch accents in their vicinity to be 
scaled high, and low ones causing them to be scaled low, can probably be discar- 
ded. On the one hand, this is because the final word in the second part, izquierda, 
is scaled as high as the preceding flor, even though only izquierda is followed by 
an L boundary tone, and it additionally comes after flor, so that it should be even 
lower due to declination or downstep. On the other, we might assume that if there 
are really iPs forming on smaller groupings here, that they are accompanied by 
their own boundary tones also when followed by an IP-level boundary tone, i.e. 
T-T%. As mentioned in section 3.4.7, this is a theoretical convention not usually 
followed in the literature on Spanish intonation, except for in Gabriel (2007). But if 
we were to follow this convention, it might be proposed that underlyingly, the tonal 
configuration on the final izquierda here is LH* H-L%, with the additional H- only 
surfacing in a *boost" to the scaling of the pitch accent. This approach would lead 
to problems, however. On the one hand it is somewhat circular because it couldn't 
independently postulate the presence of such *hidden" boundary tones only visible 
through their pitch boosting behaviour except where scaling behaviour was see- 
mingly in need of an explanation. On the other, if these boundary tones are suppo- 
sed to be present systematically, then at least the first and second topic, and derecha 
and izquierda, respectively, would be expected to be scaled at the same height. 
Since they aren't, the systematic relation in their scaling still needs to be explained, 
boundary tones or not, and parsimony would then suggest not stipulating a const- 
ruct such as *hidden" boundary tones if they do not help to explain a phenomenon. 


» = 
està khupandaslis. flor | amari 
| 
eee! 


u| [paf Bor 
| 


Figure 80: OZ14 ELQUD ES 437? (el colibri rojo está chupando a la flor roja y el colibri verde está 

chupando a la flor amarilla ‘the red hummingbird is drinking from the red flower and the green 

hummingbird is drinking from the yellow flower’). Blue marking highlights high posttonic pitch 
indicating the presence of H boundary tones. 


209 https://osf.io/hp4fm/ 
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This point is further corroborated when considering an additional example, Figure 
80. Here, the scaling relation between pitch accents belonging to the different 
parts of the utterance is less easily visible because the differences in pitch range 
between accents lying on the same reference line is smaller, butit is still present. In 
addition, it is different in that while in Figure 78, H boundary tones between topic 
and comment could not really be identified, they are unambiguously present here, 
identifiable from high pitch on the posttonic of rojo and verde, in addition to that 
between the first and second part, identifiable from high pitch on the posttonic of 
roja (all three marked in blue in Figure 80). However, here the presence or absence 
of boundary tones also does not seem solely responsible for the scaling of the sur- 
rounding pitch context: the high boundary tone in the three positions is preceded 
by a very high first topic, a less higher first comment and a second topic that is in 
between these two heightwise, respectively. The peaks on both comments follow 
a general downward trend, although the first one is followed by a high boundary 
tone, and the second by a low one, and the topics do not show this downward trend, 
although they are followed by a high boundary tone, just like the first comment. All 
this means that the binary identity of a boundary tones (H or L) is insufficient to 
explain the observed scaling relation, which evidences more levels than just two 
and is partially independent of boundary tones. The quantitative analysis will cor- 
roborate this impression. 


5.2.3 Quantified analysis of hierarchical scaling structure via relational 
measurements 


For a broader view of things, let us look at how the scaling relations between indivi- 
dual units turn out under quantification. To that purpose, the double topic-utteran- 
ces from Elqud, i.e. responses by the ten speakers OZ14, QP44, QZ13, SG15, TP03, 
XJ45, ZZ24, ZE55, NQ01, and LJ22 in the experimental items 14, 26, 43, 44, 52, 59, and 
65 were annotated by hand and measurements were taken via a praat script (see 
Appendix C). Only those utterances qualifying as full responses were considered. 
The following annotations and measurements were taken: mean, minimum, and 
maximum FO on every accentable syllable, representing scaling of pitch accents, 
and on every syllable that is the final voiced one in one of the four parts of each 
utterance, i.e. in topic 1, comment 1, topic 2, and comment 2, respectively, repre- 
senting scaling of potential boundary tones there. The time stamps of all of these 
measurements were also taken. The word class of each word bearing an accenta- 
ble syllable was also annotated, as well as whether the word was contrastive in the 
utterance or not. For each utterance, a highest pitch measurement (MAXİMUM utterance) 
was determined that was not due to consonantal microprosody or noise. In the same 
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way, a lowest pitch for each utterance was determined (minimuMytterance). These 
utterance maxima and minima served to derive normalized values for each mea- 
surement taken on accentable and part-final syllables, according to the following 
linear transformation adapted from Truckenbrodt & Féry (2015: 29):219 


(68) Transformation for normalized pitch values 
(pitch Value measured -MİNİMUM utterance) 
(MAXİIMUMutterance -MİNİMUM utterance) 


pitch VALUE transformed = 


Transformed pitch values are all on the same scale and thus become relative meas- 
urements that are comparable between speakers and utterances, even though indi- 
vidual items might differ substantially in their absolute values. Some further utter- 
ances had to be excluded from the final measurements because of insufficiently 
voiced pitch tracks or because they included too many false starts and hesitations 
or ellided contrastive words. One utterance was also excluded because it was pro- 
duced in a different word order than all others. This leaves a total of 51 utterances 
whose measurements were considered in the final analysis in R, distributed across 
speakers and experimental items as shown in Table 13. 


Table 13: Spanish Elqud utterances considered in the analysis, 
sorted according to speaker and experimental item. 


experimental item 
speaker 14 26 43 44 52 59 65 total 
07214 "A Y v x v V v 6 
QP44 x v Y x v vyv ov 5 
QZ13 V V v x "4 v V 6 
SG15 Y x x Y x Y x 3 
TP03 v v v x x x x 3 
Xj45 Y x Y x Y Y Y 5 
2224 Y "A v Y v x v 6 
ZE55 Y Y v x x Y v 5 
NQ01 Y "4 Y "4 Y Y Y 7 
LJ22 Y Y Y x x "4 "4 5 
total 9 8 9 3 6 8 8 51 


210 Truckenbrodt & Féry (2015) use averages of values taken at predetermined positions for the 
minimum and maximum instead of utterance-specific measurements. Since the data in the present 
work are more spontaneous, the measurement were adapted in the way described. 
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As can be seen from the table, most speakers did not produce a full analyzable 
utterance in response to all experimental items. Since item 44 was a filler and did 
not contain a mismatch between the visual and the audio stimulus, most speakers 
only responded with short responses along the lines of si está bien ‘yes it’s alright’, 
leaving only 3 full responses. 

In the following, I present pooled measurements from all utterances consid- 
ered for each of the experimental items alongside an individual example. The 
measurements are the transformed values of the mean measurements for the 
accented and part-final syllables, of which again the means across all utterances 
were taken. 

Figure 81 gives these pooled values for the experimental item 14, the shortest 
and least complex one. In all the barplots in section 5.2.3, bar height represents 
mean across all utterances, error bars represent one standard deviation above and 
below the mean, respectively. In sections 5.2.3.6 and 5.2.3.7 I will discuss some of 
the variation represented by the error bars separately. Represented for topic one 
and two are the mean values for the accented syllable and the part-final, bound- 
ary-adjacent measurement. For the comments in part one and two, the value for 
the accented syllable of the noncontrastive word (the copular auxiliary verb está) 
and of the contrastive word (the verb in its present participle) as well as of the 
part-final measurement are given. Bars are given in the order of the position whose 
mean values they represent. Error bars represent one standard deviation above 
and below the mean across all utterance for each bar. I will talk about some of 
the variability the error bars represent specifically in the discussion. Inferential 
statistical analysis over the entirety of the data also follows in section 5.2.3.6. The 
mean values here most notably show a considerable scaling difference between the 
first and the second part. They also indicate a general downward trend/downstep 
within each part that is counteracted by increased scaling/upstep on the contrastive 
pitch accented syllable which is also final. A scaling difference between topic and 
comment is less clearly represented, and only between topic and comment of the 
second part. 


5.2.3.1 ELQUD-item 14 

Figure 82 gives the utterance produced in response to this experimental item by 
speaker SG15. Overall it shows the scaling differences also observed in the pooled 
measurements. However, the contrastive verb in the second comment is here 
not produced with increased scaling/upstep, leading to a clearer differentiation 
between the second topic and comment via relative scaling. 
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ELQUD utterance 14, n=9 


lila. 


; - 


~ 
a 


Transformed pitch value 
o 
3 


ü 


g 
$ 


topic 1 end 
topic 2 end 


comment 1 noncontr 
comment 1 contr 
comment 1 end 
comment 2 noncontr 
comment 2 contr 
tomment 2 end 


dl 
Figure 81: Barplot of pooled transformed mean pitch values from all utterances of experimental item 
14 from Spanish Elqud. 


Example: [el hombre} topic [est@] commentt_noncontr [durmiendo] commenti contr [/a NAD opicz [est@] comment2_noncontr 
[comiendo] comment2 contr (Cf. Figure 82). 


mau) 


Figure 82: SG15 ELQUD ES 14?" (el hombre está durmiendo la niña está comiendo ‘the man is sleeping 
the girl is eating’). 


211 https://osf.io/x9d4r/ 
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“Hm 


5.2.3.2 ELQUD-item 26 


ELQUD utterance 26, n=8 
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Figure 83: Barplot of pooled transformed mean pitch values from all utterances of experimental item 
26 from Spanish Elqud. 


Example: [el Pperro]topialestálcommenti auxl@NCHMA] commenti P_contrlde la OCA) commenti_nLel 9AtO} topic 
[está].omment2_aux[debajo]comment2_p_contr[de la rOCA]comment2 N (cf. Figure 84). 


Figure 83 gives the pooled values for responses to experimental item 26. The topics 
remain simple, while the comments are more complex compared to 14, with bars 
giving the mean values for the auxiliary copular verb está (aux), a contrastive 
prepositional noun (P), and a noncontrastive noun (N) in both comments.?'? Note 
that the word bearing the contrast (encima “above” and debajo “below” in OZ14's 
response, cf. Figure 84) here is prefinal in the comments, unlike in 14, where it 
was final. As in item 14, there is a clear scaling difference between the two parts 
which are also separated by a high boundary rise, but the difference between topic 
and comment more clearly emerges only in the second part. The first comment is 
initially scaled just a little lower than the first topic, but its final element is higher 
scaled/upstepped again, to the same level as the topic. Note again that this final 


212 Not all speakers produced each element in the second comment. 
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element is not the contrastive one in the first comment, which precedes it and is 
scaled lower on average. 


a yt " E he ut PI 


1 A MN NIN y 


raw 


Figure 84: OZ14 ELQUD ES 26"? (el perro está encima de la roca y el gato está debajo de la roca ‘the 
dog is above the rock and the cat is below the rock). 


5.2.3.3 ELQUD-items 43 and 44 

With experimental item 43, whose values are given in Figure 85, internal complex- 
ity increases yet again. While the topics in 14 and 26 consisted of a single noun, here 
they consist of a noncontrastive noun followed by an adjective that is contrastive. 
For comment 1, in addition to the auxiliary, values are given for the accentable 
syllable on the verb in the present participle, followed by a noncontrastive noun 
and its contrastive adjective. For comment 2, most speakers (except for 0Z14 and 
QP44) only realized the noncontrastive noun and the contrastive adjective, so the 
values for the noncontrastive elements are represented in a single bar. In the first 
comment, ZE55 did not produce the verb, NQ01 the auxiliary, and L]22 neither of 
them. Regarding scaling, in this item the differences between topic and comment in 
both parts are most pronounced. We also again see upstep or increased scaling on 
the contrastive or final (which cannot be determined here) accentable word in the 
subparts except for the last one. Unlike in the previous two items, comment 1 and 
topic 2 here are nearly at the same level, with topic 2 even scaled slightly higher. 
While this is a very slight difference, it is rather incompatible with a purely local 
downstep model. This is even more so when considering that the final boundary 
tones in both parts are scaled roughly to the same height, but after comment 1 this 
is followed by a continuation at the same or increased register level, while after 
topic 2 it is followed by a considerable drop and then decline. Their being at the 


213 https://osf.io/uzwa4/ 


5.2 Complex utterances: “Double topic”-constructions — 231 


same level, and topic 2 being scaled slightly higher, is however fully compatible 
with a model assuming a nested prosodic structure cued via these scaling relations, 
in which a first sister at the same phrasal level is scaled higher than following ones, 
such as represented in Figure 86 and proposed by Truckenbrodt & Féry (2015) for 
German. 


ELQUD utterance 43, n=9 
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Figure 85: Barplot of pooled transformed mean pitch values from all utterances of experimental item 
43 from Spanish Elqud. 


Example: [el colibri Íltopict_noncontr[FOJ O ]topic1_contreStá]comment1 auxLchupando]commenn vlla flor. commenti N 


[roja]commenn A contrlel colibri Íltopic2_noncontr[Ver: 'Aeltopic2_contr [/a flor ]comment2 noncon [amar illd] comment2 A contr 
(cf. Figure 87). 


If, as represented in Figure 86, the four information structurally determined sub- 
parts of the utterance are each assumed to form a phrase, and those in turn group in 
twos to form phrases corresponding to the two main parts which then come together 
to form the utterance as a whole, and this relation is reflected in tonal scaling as 
described, then this would perfectly explain the scaling difference observed. Note 
that this necessitates assuming more than two phrasal prosodic units of iP-level or 
above, that otherwise do not seem to be differentiated in their prosodic makeup. 
Thus such a model would require a recursive prosodic structure at the ip/IP-level, 
as discussed in section 3.6. Assuming a flat structure at the subpart-level, thus with 
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quaternary branching, could not explain the observed different scaling relations 
between topic 1 and comment 1, and topic 2 and comment 2, on the one hand, and 
comment 1 and topic 2, on the other. A structure with ternary branching, taking 
comment 1 and topic 2 as a single iP, on the other hand, would clash with the 
attested boundary tones between these two parts. The reality of the largest phrasal 
unit, given as “iP3” in Figure 86, besides being necessary as the domain of full reset, 
is furthermore supported by the fact that it is closed by a low boundary tone, and 
that the last contrastive or final accent in it is not upstepped, as in all non-final 
phrases, but downstepped. 


iP3 


[ lipi [ lipi [ ial lix 


noun adj noncontr1 noncontr2  contr/ noun adi noncontr contr/ 
final final 
L: k j he 5» 5 1] | ] 
I Í | Í 

topic1 comment1 topic2 comment2 

- | | | 
y Y 
part 1 part 2 


Figure 86: Schematized scaling model with register lines and phrasal prosodic structure for complex 
Spanish double topic utterances from Elqud. 


We already saw an example of a speaker response to item 43 in Figure 80 above. 
Figure 87 gives another, by speaker XJ45. Note the somewhat unexpected increased 
scaling on the accented syllable of the noncontrastive noun of topic 1 and the 
boundary tone following the second topic. 
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Figure 87: Xj45 ELQUD ES 43?" (el colu- colibri rojo está chupando la flor roja y el colibri verde la flor 
amarilla ‘the red hummingbird is drinking the red flower and the green hummingbird the yellow 
flower’). 


ELQUD utterance 44, n=3 
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Figure 88: Barplot of pooled transformed mean pitch values from all utterances of experimental item 
44 from Spanish Elqud. 


Example: [perroltopic_noncontrM@grO]topic1_contr[EStá]comment1_aux[/Ugando]comment1_v 
[con el pelota) comment Negra commenti A conte! T0 ]iopic2 nonconvLbl'amico ipic contr 
[con la pelota]comment noncontr[OZUl comment2_A contr (cf. Figure 89). 


214 https://osf.io/ugehf/ 
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Figure 88 gives the pooled values for item 44. They come from only three utteran- 
ces and so are prone to individual variation, but it is still interesting to see that 
essentially the same scaling relations obtain as for item 43, even though responses 
to 44 are not partial denials, since no mismatch between visual and audio stimulus 
exists. Figure 89 gives an example of a response to item 44 by speaker ZZ24. 


Figure 89: Z724 ELQUD ES 44?" (así es perro negro está jugando con el pelota roja y perro blanco con la 
pelota azul ‘that’s it black dog is playing with the red ball and white dog with the blue ball’). 


5.2.3.4 ELQUD-item 52 

Figure 90 gives the pooled values for responses to experimental item 52. Here in the 
first comment, speakers varied between a realization with a relative clause as in 
the audio stimulus, e.g. el picaflor verde está chupando la flor que está a la derecha 
“the green hummingbird is drinking from the flower that is on the right" (NQ01), 
and a less complex version, e.g. as in QZ13's (Figure 78), which has el colibrí verde 
está chupando la flor de la derecha “the green hummingbird is drinking from the 
flower on the right". In order to treat these uniformly, the second instance of the 
auxiliary was not included in the values given in the barplot. In terms of scaling, 
the results are very similar to those of experimental item 43 and 44, further sup- 
porting the analytical consequences drawn from them. An interesting difference 
concerns the accent on the noncontrastive noun in the first topic, which is scaled 
higher than the one on the contrastive adjective following it. This occurs variably 
(cf. the error bars) in utterances produced in response to all of the double topic- 
items, e.g. also in XJ45’s 43, seen in Figure 87. It could be something like a boost 
occurring only on the accent that is initial in the utterance as a whole, and would as 
such serve as an additional scaling cue to the overall prosodic structure. I discuss 
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ELQUD utterance 52, n=6 
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Figure 90: Barplot of pooled transformed mean pitch values from all utterances of experimental item 
52 from Spanish Elqud. 


Exam ple: [colibrf].opic_noncontr[Verde]topic1_contr[está] comment1_aux[Chupando]commenti_v[/a flor] comment1_N1 [que 
está a a la derecha]comment1_contr[COlibrf].opic2_noncontr[ @ZUN topic2_contr [está chupando la flor]comment2_noncontr[ que 
está a la izquierda) -omment2_A contr (Cf. Figure 91). 


this further in section 5.2.3.7. Above, we already saw QZ13's utterance produced 
as response to this experimental item (Figure 78); Figure 91 is another example 
produced by speaker QP44. 


5.2.3.5 ELQUD-items 59 and 65 

Finally, we will consider the results from experimental items 59 (Figure 92) and 
65 (Figure 93). Responses to these items were very similar, which reflects that this 
is also the case for their stimuli (cf. Table 11). The main difference in the stimuli is 
that for item 65, the location of the two skunks in both parts is the same (el zorrillo 
grande está al frente de la casa grande, y el zorrillo pequefio está al frente de la casa 
pequefia “the big skunk is in front of the big house and the small skunk is in front 
of the small house”) and not a mismatch between visual and audio stimulus, while 
in 59, it is (el zorrillo pequeño está detrás de la casa pequeña, y el zorrillo grande 
está al frente de la casa grande “the small skunk is behind the small house and the 
big skunk is in front of the big house"). Thus in 59, but not in 65, the prepositional 
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TU Là) 


Figure 91: QP44 ELQUD ES 527^ (el colibrí verde está chupando la flor que está a la derecha y el colibri 
azul está chupando la flor que está a la izquierda ‘the green hummingbird is drinking from the flower 
that’s on the right and the blue hummingbird is drinking from the flower that's on the left’). 


ELQUD utterance 59, n=8 
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Figure 92: Barplot of pooled transformed mean pitch values from all utterances of experimental item 
59 from Spanish Elqud. 


Example: [el zorrillo]sopic1_noncontr[Pequefio]sopic1_contr[€Stá]comment1_aux[al frente] comment P contr. 
[de la casa] commenti nÍ ande] commenti A contr [el zorr. ÍllO]topic2_noncontr[Yr 'ANdEltopic2_contrlEStá]comment2_ aux 
[detras] omment2.P_contrl de la casa] comment2 N[DeqUuefía comment A contr (cf.Figure 94). 
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noun (P in the figures) in both comments is potentially contrastive, in addition to 
the adjectives in both topics and comments. This is also marked in the labels below 
the bar giving the pooled value for the prepositional noun in the two figures. 


ELQUD utterance 65, n=8 


08 


I'm 


Transformed pitch value 
- e 
> E 


o 
® 


e 


kl 
€ 


topic_1_nontontr 
topic_1_ contr 
topic_1_end 
comment 1 au 
comment 1 P 
comment 1! N 
end 
topic 2 noncontr 
topic 2 contr 
pk 2 e 
tomment 2 aux 
tommen 2 P 
comment 2 N 


comment ! A contr 
comment 1 

comment 2 A contr 
comment 2 end 


i 


Figure 93: Barplot of pooled transformed mean pitch values from all utterances of experimental item 
65 from Spanish Elqud. 


Example: [el zorrillo)opict_noncontr[Yrande]sopic1_contr[EStá]commentt_aux[a/ frente] comment! P 
[de una casa] commenti N[Pequefia].ommenti A contr [el ZOFTII0} topicz_noncontrPEQUEMO] topic2_contrlestd] comment2_aux 
[al frente] comment pide una COSA] comment2 NIGFANde] comment2_A_contr (cf. Figure 95). 


Interestingly, this difference does not translate to an increased scaling on the prep- 
ositional nouns in the responses to 59 (Figure 92) compared to those to 65 (Figure 93). 
If anything, the opposite can be observed, namely that those values in the responses 
to 65 are actually scaled higher. This could suggest that it is not contrastiveness per 
se that causes increased scaling, but instead whether a pitch accent is final in a 
phrase. This would also be supported by the results from item 26 above. Apart from 
this somewhat unexpected result, the scaling relations between the main parts 
for responses to items 59 and 65 are very comparable to those for the other more 
complex items 43, 44, and 52. They are thus also very compatible with the hierarchi- 
cal scaling model proposed for 43 and only insufficiently explained by models that 
seek to explain downstep/declination only locally and without making reference 
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to a recursive prosodic structure. Figures 94 and 95 give examples for responses to 
item 59, and 65, respectively, by speakers LJ22 and QZ13. 


rog) 


100 


E rom, pego | esti [alfren(te)de| la | cma | grande | cl romli gande| eti | detrás Me la | casa Poquetia 


| m quc a fren ca san mi gan ta tras a que 


Figure 94: LJ22 ELQUD ES 59^" (el zorrillo pequeño está al frente de la casa grande el zorrillo grande está 


detrás de la casa pequefia ‘the small skunk is in front of the large house the large skunk is behind the 
small house’). 


TU gw) 


| 
mri | |gran| tw | | Gen a fea que m he ^ Jadi [nm] d jon] [om 


Figure 95: Q213 ELQUD ES 65?*? (el zorrillo grande está al frente de una casa pequeña y el zorrillo 
pequeño está al frente de una casa grande ‘the large skunk is in front of a small house and the small 
skunk is in front of a large house". 


5.2.3.6 Inferential statistical results 

The foregoing descriptive results strongly suggest that scaling on pitch accented 
syllables in these complex utterances is affected by a hierarchical prosodic struc- 
ture that itself reflects an information structural partition into two topics with their 
comments. To support this conclusion, inferential statistics were run on the trans- 
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formed pitch height data. They cannot replace or even fully replicate the linguis- 
tic accounts rivaling to explain the scaling phenomena observed in the data, but 
they can constitute evidence in favour or against them. The rivaling accounts here 
are: as a null hypothesis, pitch accent scaling is not actually affected by non-local 
relations and instead simply follows a declining course over the utterance, as orig- 
inally proposed for English by Liberman & Pierrehumbert (1984) and for Mexican 
Spanish by Prieto et al. (1996). Against that I have proposed as alternative hypothe- 
sis a recursive prosodic structure which affects pitch accent scaling non-locally via 
register levels (mapping to information structural partitions), similar to what has 
been proposed for German by Truckenbrodt & Féry (2015). To approximate these 
diverging accounts, a multiple regression model was fitted to the data with two cat- 
egorical variables and one continuous predictor variable. The model takes subpart 
of the utterance (with the four levels topic 1, comment 1, topic 2, and comment 
2), contrastiveness, i.e. whether an accentable syllable was part of a contrastive 
word or not, and normalized start time of the accentable syllable from the begin- 
ning of the utterance as predictor variables, and transformed mean pitch height on 
accentable syllables as dependent variable. In this regression model, the predic- 
tor variables of utterance subpart and contrastiveness represent the hierarchical 
scaling account, while syllable start time as predictor variable approximates the 
purely local declination account. According to the model, transformed mean pitch 
value on accentable syllables is significantly or even highly significantly affected 
by which of the utterance subparts an accentable syllable belongs to. It is nearly 
significantly affected by whether the word is contrastive or not. It is not signifi- 
cantly affected, however, by the start time of the syllable.?!? The adjusted R? for the 


219 These are the model results: 


estimate p  st.errorg  t-value p-value 


intercept 0.51 0.02 26.42 < 0.001 *** 
start time of accented syllable — 0.01 0.00 1.47 0.142 
comment2 -0.18 0.03 -7.11 < 0.001 *** 
topic1 0.13 0.02 5.09 < 0.001 *** 
topic2 -0.06 0.03 -2.51 0.0125 * 
contrast on word: yes 0.03 0.02 1.61 0.107 

F(5, 489)= 30.4, p < 0.001 

Adj. R?= 0.229 


The intercept in the model corresponds to the estimate for a syllable in a noncontrastive word in 
the first comment. While checking model assumptions, a few outliers were found that did not seem 
to be due to error and thus should be included. For that reason, a robust linear model using rlm in 
the R package MASS (Venables & Ripley 2007) with the same predictor and dependent values was 
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model is only about 0.23, meaning that only 23% ofthe variation in the transformed 
pitch height data on accentable syllables is actually explained via the predictors 
included. That is not too surprising, since the recorded data are quite spontaneous 
and it is easy to see that there is a considerable amount of individual pitch height 
variation throughout the utterances from the examples we have seen so far. That 
the utterance subparts affect pitch height significantly despite this overall varia- 
bility probably speaks to the robustness of this effect. In order to further explore 
the two rivaling accounts, a simple model was also constructed using only start 
time of the syllable as predictor. This yielded a significant result, but explained 
just 10% of variation in the data.??? That this variable becomes insignificant in the 
more complex model suggests that its effect becomes superseded by the effect that 
subpart has on transformed pitch height. This is further confirmed by applying 
regression models to the pitch height values in each utterance subpart separately, 
again with start time of the syllable as predictor. Those models did not produce 
any significant results except in the second topic, where a later starting time actu- 
ally implied a somewhat higher pitch value, against the expecations from the null 
hypothesis.””1 Figures 96 and 97 provide visualisations of the transformed mean 


also executed. A Levene's test for homogeneity of variance also returned significant results, indi- 
cating variances are not equal across the predictor groups. Therefore, robust model estimates and 
p-values were also calculated using the sandwich package (Zeileis 2004; Zeileis et al. 2020) and the 
coeftest function from the Imtest package (Zeileis & Hothorn 2002). Since results from all of these 
robust approaches did not differ substantially from those of the non-robust original model (and in 
particular did not decrease significance of any of the predictors), only the results of the latter are 
reported here. 

220 These are the results for this simplified model: 


estimate B  st.errorg  t-value p-value 


intercept 0.61 0.02 38.41 < 0.001 *** 
start time of accented syllable — -0.02 0.00 -7.4 < 0.001 *** 
F(1, 493)= 54.8, p < 0.001 

R= 0.1 


221 These are the results for the model predicting transformed mean pitch height on accentable 
syllables via syllable start time only within the second topic: 


estimate B st.errorB t-value p-value 


intercept 0.36 0.05 7.25 < 0.001 *** 
start time of accented syllable — 0.03 0.01 2.96 < 0.004 ** 
F(1, 82)= 8.74, p = 0.004 

R= 0.1 


Within the other three subparts of the utterance, the model did not produce significant results. 
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pitch value distributions per utterance subpart. The difference in the values per 
subpart of the utterances is clearly visible, with those for the first comment and the 
second topic basically at the same level in the longer utterances responding to item 
43, 44, 52, 59, 65. Yet we saw in the previous sections that there is a clearly identifia- 
ble boundary pitch movement between those two subparts, so that they each must 
correspond to a prosodic unit. Together, these results firmly support the conclusion 
reached from the inspection of individual examples and the pooled values for each 
experimental item, namely that the null hypothesis account explaining pitch height 
via local downstep or declination is insufficient, and that instead the hierarchical 
scaling account is far more capable of explaining what is observed. That the pitch 
height differences between the subparts are scaled differently depending on which 
subparts they separate cannot simply be the effect of a purely local downstep pro- 
gression, and is much more plausibly explained as due to a hierarchical structure 
with different boundary strengths. Regarding the result of contrastiveness not 
being a significant predictor of increased pitch height, this supports the suggestion 
already made in the discussion of items 26 and 65, that the factor responsible for 
the solidly observed upstepped values on accents in some words is perhaps less 
that they are contrastive, and instead that they are final in a phrase. 


5.2.3.7 Discussion 

We have now seen fairly solid evidence that pitch accent scaling in these Spanish 
double topic-utterances from Elqud follows a hierarchical prosodic structure. In the 
following, I would like to explore some further issues mostly related to variability. 
From looking at individual examples as well as the length of the error bars on some 
of the barplots and the relatively low adjusted R? of 0.23 in the regression model, it 
is evident that there is a considerable amount of individual variation and that not 
all individual utterances produce the same scaling relations. In the next section, 
we will look at a subset of utterances by a group of speakers in detail that display 
an accentuation pattern similar to what was described as “phrase accentuation” 
in section 5.1.3.1. Here I would like to concentrate instead on two other sources of 
variability. The variable tendency to scale the initial noun in the first topic particu- 
larly high, as an initial “boost”, was already mentioned. This is visible not only in 
individual examples but also in the barplots for items 52, 59, and 65, suggesting it 
is not an isolated quirk. It seems to stand in opposition to the tendency to increase 
relative scaling on the final accent in a subpart. The other variable tendency is 
to produce the very final accent, in the second comment, as either upstepped or 
downstepped relative to preceding ones. The particular large standard deviations 
on the pooled values for this position in virtually all the experimental items indi- 
cate this variability, as do individual examples. For instance, XJ45 and LJ22 produce 
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ELQUD utterances 14, 26, 43, 44, 52, 59, 65 


position 

topic1 
= comment 
Ed topic2 


— comment? 


Transformed mean pitch accent value 


Position 


Figure 96: Combined violin- and boxplots for the transformed mean pitch value measured on accentable 
syllables in the double-topic utterances in Spanish Elqud, sorted according to utterance subpart. Violin 
shape indicates density. Horizontal bar in boxplot indicates median, notches indicate confidence 
intervals, box sixe interquartile range. Number of observations per utterance subpart: 84, 181, 84, 146. 


the upstepped variant in their responses to items 43 and 65, respectively (Figures 
87 and 95), while QP44 and QZ13 produce the downstepped one in their 52 and 59, 
respectively (Figures 91 and 94). I would like to suggest an explanation for both of 
these sources of scaling variation, the initial “boost” and the final upstep/downs- 
tep, in terms of conflicting functions. In both cases, upstepping the final element in 
the phrase corresponding to an information structurally relevant unit, especially 
when it is contrastive, arguably aids in cueing precisely this information structural 
status: not only that the item itself is contrastive, but also to delimit the particular 
subpart the accent is the final one of. In contrast, *boosting" the initial element in 
the utterance, and downstepping the final one, aids in cueing the delimitation of 
the utterance as a whole. This is perhaps particularly relevant in these long utteran- 
ces with complex internal structure.??? The cues for these two types of structural 


222 Recall that we already saw two other utterances with a similar initial boost that seems to go 
against information structure, Figure 60 and Figure 61 in section 5.1.3.1.2. They are also compar- 
atively long. 
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indication are naturally incompatible with each other, since scaling is always rela- 
tive and they have opposite effects on the same position (down- or upstepping the 
final element in a subpart). It would be an interesting question for further rese- 
arch to explore whether this hypothesis of conflicting structural cues holds up in 
perception. 

Another issue is the effect that item length seems to have. In the responses to 
items 14 and 26, the shortest and least complex utterances, we saw that the grea- 
test difference in register height was observed between the two main parts, while 
that between the four subparts only really emerged in the longer responses, those 
to items 43, 44, 52, 59, and 65. This difference comes out very clearly in the com- 
parison between the value distributions for the two item groups in Figure 97 (left 
vs. right). Furthermore, in 65 it was also the case that mean pitch height on the 
prepositional noun in the first comment seemed scaled up relative to the prece- 
ding auxiliary verb and the following noun, even though the prepositional noun 
here is neither contrastive (unlike in the responses to item 59) nor in final position 
in one of the four main subparts. That function words including the copulas are 
more likely not to be accented (cf. section 5.1.1.2) might also play a role here. But 
the upscaling of the noncontrastive prepositional noun nonetheless seems to be an 
indication that in a longer subpart, there is a tendency to form smaller units whose 
final elements are upstepped in order to create a somewhat alternating profile of 
prominence cued by pitch height, which would be rhythmically motivated. Both 
of these observations evidence an interplay of various factors influencing phra- 
sing: there seems to be a conflict between a tendency not to create a large amount 
of embedded high-level phrases (embodied in NONRECURSIVITY, cf. section 3.6.1), 
and one to cue information structurally relevant units and relations via prosodic 
structure as expressed through hierarchical scaling. Their interaction seems medi- 
ated via rhythm: at short utterance lengths, the ban on recursivity mostly prevails, 
but at larger ones and with increased internal complexity, rhythmic considerations 
must also come into play that favour a greater number of prosodically distinguis- 
hed units, because this leads to creating more eurhythmic prominence profiles (via 
regular contrasts in pitch height at both lower and higher levels of the prosodic 
structure). Together, this rhythmic demand and that for cueing information struc- 
tural relations faithfully then seem to overcome the resistance against non-recur- 
sive prosodic phrasing. Recall that in section 3.6.4, results from earlier studies on 
Spanish phrasing also pointed to an interplay of mapping-like factors and more 
genuinely prosodic/rhythmic ones. The results here effectively indicate that this 
interaction also takes place at higher levels of the prosodic structure in addition to 
pointing to the relevance of scaling as a cue for phrasing. 

The last issue I would like to raise here before moving on to the discussion 
of speakers responses that exhibit phrase accentuation concerns the nature of the 
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categories that the prosodic units identifiable via register height scaling map onto. 
In sections 3.6.3 and 3.6.4, the discussion already showed that assuming a tight syn- 
tax-prosody mapping is not necessarily the way to go when looking for prosodic 
recursion. I would like to argue here that the prosodic structure indeed cues par- 
titions relevant for information structural interpretation, and that it is difficult to 
claim that it maps onto syntax instead, using one of the speaker responses that are 
particularly elliptical. Figure 98 is LJ22's response to item 43 which we already saw 
in (65), with reference lines to show that it portrays the hierarchical scaling struc- 
ture even though overall pitch range is rather small. 


PO (H2) 


ÍM 


DD 


Figure 98: LJ22 ELQUD ES 437? (el colibrí uh verde flor amarilla y el colibri uh rojo uhm flor roja ‘the 
green hummingbird yellow flower and the red hummingbird red flower’). Cf. (65). 


This is an utterance in which syntax above the level of the NP has essentially been 
done away with, and yet prosodic structure, as evidenced by scaling and partially 
also boundary tones, still cues the relevant information structural relations between 
the referents. In both parts of the utterance, there is no verb that could project 
any syntactic structure above the noun phrase. Neither could therefore qualify as a 
“standard clause” because they lack “a predicate, and a locus for Tense”, which are 
seen as necessary ingredients in one account (Selkirk 2011: 452). Hence, the Match 
constraints from Selkirk (2011) would predict phonological phrases to form on each 
of the noun phrases colibri verde, flor amarilla, colibri rojo and flor roja, but not ips 
or IPs (tin Selkirk 2011) on either them or on the first and second part, respectively, 
since those are strictly matched only to clauses (Selkirk 2011: 455). However, there 
is a clear boundary tone (as evidenced by the presence of a sustained high pitch 
posttonically on amarilla) between the two parts, indicating that the units it sep- 
arates must have ip or IP status. By the scaling evidence, the noun phrases consti- 
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tuting the topics and comments, respectively, are also phrased separately. It seems 
then that in the absence of higher-level syntax, the provision in the Match (clause, 
U constraint that it must be a clause that is matched to t would itself have to be vio- 
lable and can be outranked when there is a need to cue relevant information struc- 
tural categories (such as the topics and comments here) that are necessary for the 
correct interpretation of the utterance. An utterance like this one is strictly speak- 
ing syntactically effectively defunct, but via the (scaling and other) cues provided 
by prosody, el colibrí verde flor amarilla y el colibrí rojo flor rojo becomes parseable 
as a sequence of two larger units that are again comprised of two units each. One 
likely interpretation for such a grouping is that of topic1-commentl, topic2-com- 
ment2, and as such, this grouping is distinct from other possible groupings (e.g. 
not likely to become interpreted as a list of items of equal status) even without a 
specific context. However, the meaning relation between the elements identifiable 
as topics and comments can only be fully filled by the context.??* Even semantically, 
the utterance is only fully interpretable in a context like this one where the crucial 
variable specifying the relation in which the topical and comment elements stand 
to each other can actually be filled without a verb. That is to say, this utterance is 
only interpretable without ambiguity because the preceding turn in the discourse, 
the provocation (Farkas & Bruce 2010), can here be assumed to be known to the 
hearer by the speaker. Presumably, if one is to follow Rizzi (1997), which Selkirk 
(2011) does, then what the prosodic structures map onto in this utterance are not 
categories from information structure (conceived as generalizations about means 
by which we keep track of referents and their relations in conversations, not just 
isolated sentences), but syntactic categories in the left periphery (projected from 
functional heads in an expanded complementizer domain), which are responsible 
for building up information structure for each sentence (a view that, if taken liter- 
ally, is not easily reconcilable with a view of information structure based purely 
on context as adopted here and detailed in section 3.7.2). In this view, these func- 
tional heads are there, whether overtly expressed or not, and their complements 
are clauses which satisfy the Match (clause, 1) constraint. However, this effectively 
comes down to saying that a severely elliptical sentence such as in the utterance 


224 Utterances that should better be interpreted in terms of information structural relations in- 
stead of syntactic ones are actually not overly exceptional even in languages that have not usually 
been called “topic-prominent” in the somewhat dated typology by Li & Thompson (1976). To give 
one English example, a customer at a restaurant might tell the waiter the Colcannon, that's me in 
a situation where the waiter has arrived with several dishes at a table with several customers but 
does not remember who ordered what. In this situation everyone will understand the customer not 
to mean that they are a (sentient) stew, but merely that they ordered it. 
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under discussion is syntactically the same as one that has fully expressed verbal 
structure. Such a sentence-based view on information structure cannot explain that 
this utterance is perfectly understandable only in this discourse context (and not 
understandable in others). In such a view, the action that relates the topic referents 
to the comment referents would remain underspecified and the sentence would 
possibly have to be judged unintelligible because access to the discourse context 
were lacking. Thus claiming that the syntax here is the same as in a correspond- 
ing non-elliptical sentence would actually mean saying that syntax is largely irrel- 
evant for interpretation. Instead, I prefer a view in which both prosody and syntax 
provide important cues for an information-structural interpretation of utterances 
in context, but where such an interpretation can also be shaped by only or mainly 
one of them in cases where the other is lacking. 


5.2.4 Variant double-topic utterances with phrase accentuation 


In this section we will specifically investigate a subset of the double-topic utterances 
that exhibit what was called *phrase accentuation" in section 5.1.3.1. In phrase 
accentuation, unlike in the main variant of accentuation for the Huari Spanish data 
considered here, only phrase-final stressed syllables are fully pitch accented while 
preceding stressed syllables are either completely deaccented or their pitch span 
is severely compressed. The double topic-utterances we looked at so far all exhib- 
ited the normal accentuation pattern in which nearly all accentable syllables are 
also indeed pitch accented. Phrase accentuation in the corpus of double topic-utter- 
ances is produced by three speakers, ZE55, NQ01, and XJ45. We will look at two of 
their longest utterances, the responses to items 59 and 65, separating the discussion 
by speaker. In particular, we will explore how their divergent accentuation pattern 
interacts with hierarchical pitch scaling as cue for a recursive prosodic structure. 


5.2.4.1 ZE55's 59 and 65 

Speaker ZE55’s response to experimental item 59 is given in Figure 99. In the first 
topic, both the noncontrastive noun and the contrastive adjective seem to form 
phrases with a LH* pitch accent on the stressed syllable and a rising boundary 
tone, unambigously identifiable at least after the adjective. Between them, there 
is reset to a low level that stays the same throughout the utterance. In the first 
comment, only every second accentable word (atrás and pequefia) is actually pitch 
accented, while the preceding accentable words (está and casa) seem to only form 
part of the initial low stretch before the rise to the accented syllable, so that in both 
cases, something like LH* is formed on the two words together. Pitch scaling also 
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Figure 99: ZE55 ELQUD ES 597" (el zorrillo hm grande está atrás de la casa pequeña y el zorrillo 
chiquito está delante de la casa grande “the big skunk is behind the small house and the little skunk 
is in front of the big house). 


follows an alternating pattern, but at a higher level: of the first two pitch accented 
elements (the two composing the topic), the boundary rise on the second is scaled 
much higher; similarly, ofthe two pitch accents composing the first comment, again 
the second (on de la casa pequefia) is scaled much higher, but here it seems to be 
additionally followed by a low boundary tone. This increased scaling corresponds 
to the division between the subparts, and the additional low boundary tone after 
the first comment seems to further demarcate the first main part as a whole. In 
contrast, pitch on the second topic (y el zorrillo chiquito) is basically level without 
distinguishing between stressed and unstressed syllables, but quite high overall, at 
the same level as the non-upstepped pitch accents in the utterance. This is compa- 
rable to the plateau realizations also already discussed in section 5.1.1.2. This kind 
of "suspended" realization clearly distinguishes the second from the first topic. The 
second comment is produced with effectively the same pattern as the first forming 
pitch accents on alternating accentable words, except that the final pitch accent 
(on de la casa grande) is not increased in scaling. Note that we cannot observe any 
global declination or downtrend: high targets stick to either of two reference lines, 
the higher or lower one, whose values remain more or less constant throughout the 
utterance, and similarly, low targets also seem to be relatively fixated on a horizon- 
tal reference line that remains constant throughout the utterance, without decline. 
Figure 100 illustrates the scaling relations in the utterance. 

Figure 100 shows that register height here can only be said to distinguish 
between the two main parts: the highest reference line is reached only in the 
first main part. But apart from that, the subparts are additionally distinguished 
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Figure 100: ZE55 ELQUD ES 59 with added reference lines for levels of scaling. 


by further cues: the first topic is realized with two pitch accents or possibly even 
two rising phrases at the end of which the boundary tone is scaled high; the first 
comment ends on an additional low boundary tone after reaching the high ref- 
erence level; the second topic is realized as a plateau; and the second comment 
resumes the pattern of the first but without the upstep on the final accent. Thus 
a similarly complex prosodic structure is required for this utterance as for the 
double topic-utterances with *main" accentuation: minimally, each subpart must 
form an iP/IP-level phrase, the two main parts form one at a level above, and then 
the whole utterance forms another, even higher one. Additionally, a domain must 
be defined at which pitch accents are culminative. This is usually assumed to be the 
phonological phrase, to be distinguished from the prosodic word at which stress is 
assigned, thereby designating possible locations for pitch accentuation. Let us con- 
sider the distribution of boundary tones. A high boundary tone is unambiguously 
present after the first topic. Low boundary tones are present at the ends of the first 
and second part. It is not unlikely that there is also a high boundary tone at the end 
of the second topic, although this is not as unambiguous: the peak on chiquito here 
is reached already within the stressed syllable, but the high pitch extends into the 
posttonic, which could point to the existence of a high boundary tone, possibly sec- 
ondarily associated with the metrically prominent syllable (cf. Grice et al. 2000; Gus- 
senhoven 2000a, see also section 3.5.2). Assuming the presence of a boundary tone 
there has the advantage of providing us with a further argument for mapping each 
of the two topics and comments to prosodic units of the same level in a principled 
way, namely that of ip or IP, which then are the domains at whose edges boundary 
tones are assigned (cf. Gussenhoven 2004; Féry 2017). There are thus three levels 
of prosodic structure above the foot that are systematically distinguished in this 
utterance, the prosodic word (providing stress positions), the phonological phrase 
(at which pitch accents are assigned) and the ip/IP (at which boundary tones are 
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assigned). However, as argued above, we then need more in order to represent all 
the contrasts that are present here. The utterance differentiates clearly between 
the first and the second part via scaling: the highest peaks in the second part are 
globally downstepped with respect to the first part. This necessitates a mapping 
of each part to a further level of prosodic structure, and of the whole utterance to 
another, because otherwise the downstep taking place would have no domain with 
respect to which it occurs. However, because these levels do not specify additional 
tones, they should be assumed to be recursive instantiations of ip/IP/t instead of 
separate prosodic domains, following the same line of argumentation as employed 
in section 3.6 for other languages. 
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Figure 101: Proposed recursive prosodic structure for utterance ZE5b5 ELQUD ES 59. 


Figure 101 illustrates this proposed prosodic structure for the utterance ZE55_ 
ELQUD ES 59. At the level of the prosodic word, stress is assigned, but this only 
surfaces via pitch movement when the prosodic word is the last one in its pho- 
nological phrase. At the next level, phonological phrases are joined together to 
form minimal IPs, whose edges are associated with boundary tones, and which 
are cueing information-structurally relevant partitions into topics and comments. 
A scaling relation of upstep obtains at this level between the subordinate phono- 
logical phrases. One level above, the minimal IPs are joined together to form inter- 
mediate IPs, between which a downstep relation obtains at the highest level, that of 
the maximal IP. Under this proposal, at least the same number of contrasts is thus 
produced prosodically as with the *main" variant: scaling in the form of downstep 
distinguishes between the two main parts, mapped to max IPs, and in the form of 
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upstep it also distinguishes between the less and more prominent parts within the 
first topic and comment, those that create the contrast in reference between the 
first and second topic (zorrillo grande vs. chiquito) and first and second comment 
(casa pequeña vs. grande). Further contrasts are created by phrasing: the sepa- 
ration between topic and comment in both parts is achieved via the placement of 
boundary tones. And the phrasing of several prosodic words together into a single 
phonological phrase so that a pitch accent peak is only realized on the last one of 
them again creates a difference in prominence which cues information structure: 
the repeated parts of the verb and noun phrases in the comments (está and casa) 
are not separately accented and thus made less prominent than the ones which 
differ between the two parts (atrás vs. delante and pequefía vs. grande). Argua- 
bly, phrasing also sets the first topic apart, because it consists of two phonological 
phrases (one for each prosodic word), each with its own pitch accent, while every- 
where else, also in the second topic, one phonological phrase encompasses two pro- 
sodic words. Contrast between the the first topic and the first comment, and also 
between the first and the second topic is thus signaled redundantly. 


lu u k 


Figure 102: ZE55 ELQUD ES 657? (el zorrillo grande está frente a la casa chiquita y el zorrillo chiquito 
está f- frente a la casa grande “the large skunk is in front of the small house and the small skunk is in 
front of the large house). 


ZE55's utterance in response to item 65 (Figure 102) is similar in many respects 
to that to item 59: scaling distinguishes effectively between three reference levels, 
two for high tones, of which the higher one is upstepped, and a bottom level for 
low tones. Overall declination, with regards to either high or low targets, does not 
seem to take place at all. Phrase accentuation at the level of the phonological phrase 
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distinguishes between individually accented prosodic words and groups of them 
where only the final prosodic word has a pitch accent peak. The grouping is the 
same as in 59, even with respect to the difference between the first and second 
topic. While in the first, two phonological phrases are formed with distinct pitch 
accent peaks, the second consists of a single phrase, realized as a plateau with no 
pitch distinction between syllables. Phrasing at the level ofthe minimalIP is accom- 
plished via boundary tones and separates the two topics and comments. Essentially 
the only difference to ZE55 ELQUD 59 lies in the scaling of the first topic and the 
noun phrase of the second comment: while the rise at the end of the first topic in 
59 reached up to the highest reference line, here in 65 that line is marked only by 
the rises on the final noun phrases of the first, and also second, comment (which 
in 59 reached only to the downstepped reference line). Thus, here scaling does 
not distinguish between part one and part two. However, they are still contrasted 
overall with each other by the difference in their final boundary tones: part one 
here unambiguously ends in a high boundary tone, part two in a low one. This 
was not the case in ZEB5 ELQUD ES 59, where both parts ended in low boundary 
tones, but scaling clearly distinguished between them. In addition, the difference 
in phrasing between the first and the second topic, just as in 59, also highlights this 
contrast.?? Without a representation of an overarching domain (the maximal IP) 
which can make reference to the units below it, such regular differences could not 
be created. If each part (one and two) were the maximal domain for the applica- 
tion of prosodic rules (or some kind of prosodic planning), the phonology would be 
blind to all of these phenomena. However, all three phenomena considered here 
(scaling difference, boundary tone difference, phrasing difference between the two 
topics) are not blind to that, but they do seem to be interchangeable or sharing the 
same functional load to a degree. The conclusion I would therefore like to draw 
is that regarding the levels of prosodic structure which they distinguish, the two 
utterances are essentially equivalent. But it seems that the means by which the 
prosodic groupings can be expressed are variable. ZE55's “phrase accentuation” 
variant is functionally equivalent to the *main" variant in that it encodes the same 
prosodic information, but by different and even variable means. The tonal makeup 
of a phonological phrase in both variants is the same, with a H tone associating with 
the pitch accented syllable and an L tone occurring before it. 


227 The first and second topic in both ZE55 ELQUD ES 59 and 65 are (for me) very distinct in the 
auditory impression that they produce, the presence (on the first) and absence (on the second) of 
pitch accents can be clearly heard in both versions, as well as some difference in rhythm. 
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5.2.4.2 NQ01's 59 and 65 


Figure 103: NQ01 ELQUD ES 597? (el zorrillo pequeño está al frente de la casa grande y el zorrillo grande 
detrás de la casa chica “the small skunk is in front of the large house and the large skunk behind the 
small house). 


Figure 104: NQ01 ELQUD ES 657? (el zorrillo pequeño está al frente (d)e la casa grande y el zorrillo 
grande al frente (d)e la casa pequeña ‘the small skunk is in from of the large house and the large skunk 
in front of the small house). 


Speaker NQ01's utterances in response to experimental item 59 and 65 are given in 
Figures 103 and 104, respectively. These utterances (and equally 43 and 52 by the 
same speaker) share a number of similarities with those produced by ZE55, but they 
are also different in some respects. The central similarity is that they show pitch 
accenting behaviour where not every accentable word realizes a peak aligned with 
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its stressed syllable, but where instead often several words are grouped together in 
a single pitch accenting event. Sometimes, single prosodic words are also phrased 
separately this way. NQ01, just like ZE55, produces the first topic as two phrases 
(a peak on each prosodic word) in both 59 and 65, while the second topic, in both 
utterances, is phrased into one. However, while ZE55 produces peaks aligned with 
the stressed syllable of the last prosodic word in those phonological phrases that 
contain several prosodic words, NQ01 here mostly produces such units without 
any pitch peaks aligned with the stressed syllables, and instead with only a final 
rise which often begins after the last stressed syllable (e.g. casa grande and zorrillo 
grande in both her 59 and 65). As with ZE55, the extent of the phrase that is realized 
in this way is variable, extending over several words or just one, as in the first topic 
in 59, where both el zorrillo and pequeño are each realized in a separate phrase. 
However, whenever this is the case, the highest pitch peak seems to be reached not 
on the stressed syllable, but on the last one. It seems most intuitive to analyze the 
phrase-final rises in these pitch accent-less phrase as evidence for the presence of 
boundary tones. This would seem to mean either assuming boundary tones at the 
edges of phonological phrases here, but not in ZE55, or taking each of these phrases 
to be immediately encompassed within a minimal IP, which then assigns bound- 
ary tones. The latter option, in addition to feeling somewhat unwieldy because it 
creates alot of structure that does not do much, is also not desirable, because it still 
needs to explain why there are no pitch accents on the metrically strong syllables. I 
instead propose to adopt the former option and again employ the notion of second- 
ary association. If we assume that the edges of phonological phrases are in princi- 
ple available for association with high tones (i.e., also in ZE55), then the difference 
between ZE55’s and NQO1's realization is reduced to a difference in whether it is 
this edge or the metrically strongest position at which an available tone is most 
likely to manifest. This competition can be modeled as two conflicting constraints, 
and their difference in ranking then decides which position is associated primarily, 
and which secondarily, with the available H tone (see section 5.3 for the OT anal- 
ysis). In turn, this determines the pitch realization as mainly aligned within the 
metrically strong syllable or at the edge of the phrase. 

Analysing the differences between the speakers in this way allows us to keep 
a lot of the prosodic structure the same: in particular, we can keep assuming that 
there is a metrical structure by which the stressed syllables in NQ01's utterances 
are assigned more prominence than their neighbours, even if it does not surface 
in the pitch realization; the difference between the first and second topic can 
also be modeled in the same way as for ZE55: as two separate versus one single 
phonological phrases, but in both cases equally within only one minimal IP. 
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Moreover, we can assume that the same tonal sequences exist on each phonolo- 
gical phrase here at a level of phonology at which they are still unassociated (cf. 
Torreira & Grice 2018), namely LH. For ZE55, these tones are associated as LH”, 
with the leading L sometimes aligning and possibly associating with the stressed 
syllable of the less prominent prosodic word.?*° For NQ01, they are associated as 
LH- or possibly L*H-. The existence of the L tone is here deducible from the pre- 
sence of an elbow in the pitch contours, most clearly seen in the second topics in 
59 and 65, where pitch falls steadily until the end of y el zorrillo gran, from where 
it then sharply rises on the last syllable. Here, it looks as if this elbow is aligned 
with the end of the stressed syllable of the second word (which is taken to be the 
metrically strongest in the phrase). If this alignment turns out to be consistent, 
it could be seen as evidence for a (secondary) association of the L tone with this 
position. The difference between the two patterns is exemplarily visualized in 
Figure 105, where (a) shows the proposed association pattern for ZE55, and (b) 
that for NQ01. 
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Figure 105: Proposed association model for ,phrase accentuated" realizations by ZE55 (a, e.g. Figure 
99) and NQ01 (b, e.g. Figure 104), given exemplarily using the phrases de la casa pequeña for ZE55 
and y el zorrillo grande for NQ01. Solid lines between tones and positions in the structure indicate 
association, dashed lines alignment. 


A relevant question is whether the utterances by NQ01 can be argued to have the 
same complex prosodic structure as the *main variant" realizations and those by 
ZE55. As a matter of fact, just taking the mean values per accentable syllable in 


230 This is motivated mostly theoretically because of the nature of the OT constraints involved. 
See section 5.3 for a short discussion. I'm claiming that associating the L is a possibility, but it is also 
possible that it just stays unassociated. 
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NQO1's utterances, unlike ZE55's, actually produces scaling-based differentiation 
between the subparts similar to that shown by the pooled values in section 5.2.3, 
albeit also with considerable drops in pitch within each subpart. This is potentially 
an indication that the differing levels of the L tones, if they are indeed associated 
with the end of the metrically strongest syllable, could cue the hierarchical scaling 
structure here. In fact, it is obvious that NQO1 treats the L tones differently from 
ZE55: the pitch values of the local minima clearly decline overall as the utterance 
progresses, with partial reset achieved after the break between the first and second 
part, whereas with ZE55, they are all very clearly at the same low reference line. As 
far as I am aware, scaling structures have only been proposed on the basis of peak, 
ie. H tone, height (cf. Ladd 1988, 1990, 1993, 1994, 2008; van den Berg et al. 1992; 
Féry & Truckenbrodt 2005; Truckenbrodt & Féry 2015). 

But even leaving this aside, there is other evidence for a prosodic structure 
here that divides the entire utterance in two main parts, and those again in sub- 
parts corresponding to the topic and comment, respectively. This evidence consists 
in the boundary tones and their relative height. In both 59 and 65, the rise accom- 
panying the boundary between the two main parts produces the highest pitch in 
the utterance, and it is especially clearly at a higher level than that reached by 
the boundary tones separating the topic from the comment in both parts of both 
utterances. However, the division between part one and part two cannot be one 
between entirely independent domains, based on the argument already made that 
the topics in both parts are systematically phrased differently (and similarly for 
ZE55 and NQO1), and that their final boundary tones differ from each other, with 
the end of the first part being bounded by a final high boundary tone, while the 
second one ends on a low one. There is also clearly no full pitch reset between the 
two parts and pitch in the second part declines much more steeply overall than 
in the first one. Note that the strongest boundary within the first part in both 
utterances here seems to separate the noncontrastive noun from the following 
contrastive adjective in the topic, rather than separating the topic as a whole from 
the comment. This is possibly another instantiation of the "initial boost" discussed 
before. 

There is a noteworthy difference between NQOT's 59 and 65, regarding phra- 
sing and scaling within the second part. In 59, the second topic is followed by a 
high boundary tone and a short break, and then in the second comment, detrás is 
realized with what seems to be the finally rising movement produced on phonolo- 
gical phrases by this speaker. Detrás is thus realized in its own separate phonolo- 
gical phrase, standing out in the utterance, since otherwise, only in the first topic 
are individual prosodic words realized in separate phonological phrases. After 
detrás, the rest of the comment is realized as a single phrase, so that the phra- 
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sing here is (detrds)(de la casa chica). In 65, on the other hand, the corresponding 
stretch, al frente de la casa pequeña, which is also separated from the preceding 
topic by a high boundary tone but not a pause, seems to be phrased as (al frente de 
la casa)(pequefia), as evidenced by the final boundary rise at the end of casa, and 
the realization of the phonological phrase contour on pequefia. This difference in 
phrasing corresponds to the difference in information structure as evoked by the 
difference in stimulus mismatch between the two experimental items, with detrás 
contrastive in 59, but not al frente in 65 (cf. Table 11 in section 5.2.1). NQ01 here 
aligns the focused elements with right phrase edges, apparently determining that 
the best way to signal the correction against the audio stimulus (the presence of 
a [REVERSE] feature in the context, in Farkas & Bruce 2010; Roelofsen & Farkas 
2015's terms) is by cueing detrás to have highest prominence in the final comment 
in 59, even to the detriment of the following (also contrastive) chica, which is com- 
pressed, deaccented, or dephrased here. Aligning focus with phonological phrase 
edges is a strategy that is of course far more effective in the *phrase accentuation" 
variant than in the *main variant", where the right edge of virtually every prosodic 
word is also aligned with a phonological phrase edge. Additionally, we saw that 
increased scaling on a prefinal contrastive word is perhaps dispreferred because 
this conflicts with upstepping the accent on the final word in a prefinal iP/IP. Thus, 
NQ01 here perhaps makes use of a possibility for cueing information structure 
that is less available in the main variant, indicating that the different variants are 
possibly not equally well equipped for prosodically expressing the same complex 
information structures. However, the difference is not always expressed in *phrase 
accentuation”, as seen from ZE55, who does not differentiate between 59 and 65 
in this way. 


5.2.4.3 XJ45’s 59 and 65 

XJ45 is the third speaker who exhibits phrase accentuation. We already saw his 
utterance in response to experimental item 43 (Figure 87), which exhibits only some 
of the features of phrase accentuation, and in section 5.1.3.1.3 he was shown to use 
phrase accentuation independent of the context conditions under which it occur- 
red with other speakers. Yet he varies the most between accentuation patterns in 
the double topic-utterances: while ZE55 and NQ01 produce all their examples with 
*phrase accentuation", and all other speakers do not, XJ45 does both, some utteran- 
ces exhibiting features of both patterns. Here, we will look at 59 (Figure 106) and 65 
(Figure 107), and see what makes them phrase accentuated and yet different from 
the ones produced by ZE55 and NQ01. 
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Figure 106: XJ4A5 ELQUD ES 59?" (el zorrillo pequeño está al frente de la casa grande y el zorrillo grande 
está detrás de la casa pequeña ‘the small skunk is in front of the big house and the large skunk is 
behind the small house’). 


T Å mer et | 


d AT yn MA h JAVNI V 


L 
Tm W DATA) VRA i e SS d 


IIR liit. 


quc u tren a gran ri gran ta. | fren ca que 


E 


Figure 107: Xj45 ELQUD ES 657? (el zorrillo pequeño está al frente de la casa grande y el zorrillo grande 
está frente de la casa pequefia ‘the small skunk is in front of the big house and the large skunk is in 
front of the small house). 


As with ZE55 and NQ01, in XJ45’s utterances the two topics are realized differently, 
with the second one being clearly realized with a single rising pitch movement, 
while in the first one, there is a remarkable final rise on the noncontrastive el zor- 
rillo in both utterances, indicating that the topic is here produced with a phrase 
boundary after the first prosodic word, apparently in two further instances of 
the “initial boost” phenomenon. In general, the pitch movements realized on the 
various parts seem similar to those observed for NQ01: pitch is low for most of the 
phrase and then rises to a peak at the end ofthe last syllable, with an elbow realized 
at or very close to the stressed syllable of the last word. 


231 https://osf.io/h4wgn/ 
232 https://osf.io/sb3v9/ 
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This seems to be the case regardless of whether the last syllable is stressed 
or not. I therefore take these movements to reflect the same association pattern 
as that analysed for NQ01 and exemplified in Figure 105 b) above, with an H tone 
associating with the right edge of the phonological phrase and an L tone preceding 
it. The extent of these phonological phrases is also similar: they cover at least one 
prosodic word, but also more than that: arguably, in 59, al frente de la casa grande 
in the first comment and detrds de la casa pequefia in the second are each realized 
within a single phonological phrase which would then extend over three prosodic 
words each. Although XJ45 seems to phrase somewhat larger chunks together than 
NQ01, he also has in common with her how he separates topic from comment and 
part one from part two by phrasing, with high boundary tones at the end of each 
of these units. However, not everything is as with NQ01: while she consistently pro- 
duces final peaks whose rises do not even begin in the penultimate syllable even 
if that syllable is stressed, with XJ45, the stressed penult sometimes clearly takes 
part in the rise or even forms its own peak. Notably in both utterances there is a 
shoulder on the stressed syllable of grande at the end of the first comment and a 
peak on the stressed syllable of pequeño in the second, where it is then followed 
by a fall to a final low target, indicating a final low boundary tone. In this respect 
then, so to speak, XJ45 seems somewhat in between NQ01 and ZE55. Based on these 
intermediate cases, a more flexible association model seems to be at work here, one 
that allows the H tone assigned by the phonological phrase to associate both with 
the edge and the most prominent syllable. Where a separate peak is reached in the 
stressed syllable (on grande at the end of the first comment and on pequeña at the 
end of the second) there is also very likely a boundary tone following belonging to 
a higher level (at least a minimal IP; H at the end of the first comment and L at the 
end of the second). That suggests that the constraint aligning the higher-level tone 
with the phrase edge outranks that aligning the lower-level tone with it, so that the 
latter tone has to move to the next closest available TBU, the final stressed syllable. 
This situation is schematically illustrated in Figure 108. Here the unassociated tonal 
sequence is LHL or LHH, with LH being the tones assigned by the phonological 
phrase, and the final L/H being an addition, assigned by the minimal IP. 

In this proposal, the presence of a final L, or H, has the effect on the preceding 
H, that it now aligns and associates with the strongest syllable. This means that the 
tones in the sequence associate and align in a ranked hierarchy, with the higher-le- 
vel tone having precedence. In a non-IP-final phonological phrase, for XJ45 this 
leads to the H, associating with the boundary and the L, with the preceding promi- 
nent syllable, but in this final context, because there is an L, or H, to its right that 
belongs to a higher level and occupies the position at the phrase edge, the Hy has to 
stand down, so to speak, and now associates with the strongest syllable. For ZE55, 
the issue mostly does not arise, because here, the association with the strongest 
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Figure 108: Proposed alignment and association model for the last phrase in the first comment of 59 
(a) and the second of 65 (b) by XJ45. Solid lines between tones and positions in the structure indicate 
association, dashed lines alignment. 


metrical position outranks association with the edge of the phonological phrase for 
the H, tone. This leads to the observed association pattern in non-final contexts, 
and in final contexts at least on (pro-)paroxytones, no conflict arises, because the 
additional L, or H, to the right of the H, aligns with the TBU at the right edge of the 
minimal IP which is free. The alignment of the T, is still predicted to outrank the 
association of T, with the strongest position but this would only have observable 
effects on final oxytones, for which see section 5.3.3. For NQ01, on the other hand, 
the situation is a little different again: in her 59, the final phrase of the utterance, 
de la casa chica, seems to realize just a final fall, so that a final L, tone, aligned with 
the right edge of the IP, seems likely. In her 65, however, the final phrase is produced 
like any non-final one, with a final rise on the last syllable. Here, there does not 
seem to be a final L, tone (instead, the phrase is produced in the lowest pitch regis- 
ter) aligned with the edge of the IP. I propose that tones assigned by the phonologi- 
cal phrase have the same alignment and association ranking for NQ01 and XJ45: the 
H, tone seeks to be rightmost. The difference lies in what happens when additional 
boundary tones turn up. For XJ45, their constraints outrank those of the phonolo- 
gical phrase tones. For NQ01 in 59, the faithful realization of the IP-final tone also 
seems to win out, with the phonological phrase tones not surfacing, while in 65, 
they instead seem to prevail. In both cases, the H, tone is not realized further to the 
left, associated with the strongest metrical position, as in XJ45' case. Why this dif- 
ference occurs between NQO1’s 59 and 65 could perhaps be answered by referring 
to the different information structures the two utterances have. The elements in 59 
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after the separately phrased and contrastive detrás might be dephrased, suppres- 
sing any following phonological phrase tones. This would allow the L, to surface. 
In this way, no constraint rerankings are needed between the utterances: in both 
59 and 65, the association and alignment of the H, outrank that of iP/IP-level tones, 
resulting in the final rise in 65, but in 59, the PhP-tones are simply not present. AII 
of this will be analyzed in OT in section 5.3. 

Another point of divergence between XJ45 and NQ01 concerns the scaling of 
boundary tones. As seen before, NQ01 scales the boundary tones separating the two 
main parts highest, and those separating between topics and comments lower. XJ45, 
on the other hand, does not seem to do this. In 59, the boundary separating the first 
topic noun from its adjective is highest, and the following boundaries, except for 
the final one, seem more or less scaled at the same level. In 65, all boundaries except 
the final one are at roughly the same level. Some kind of downstep-like scaling does 
seem to be taking place, but it occurs on the stretches between the boundary tones: 
in 65, this results in a downstep progression of roughly equal height between the 
four subparts. In 59, it looks as if this downward progression reflects the embedded 
structure in a similar way to the one seen in the pooled analysis as well: the first 
comment is lower than the first topic, but the second topic is somewhat higher 
again but not as high as the first, and the second comment is then lowered again 
relatively to that. 


5.2.5 Comparison between the variants 


This section compares the variation between the variants discussed individually. 
The variation assumed in the prosodic structures is given in Figure 109. It provides 
the prosodic structure for three utterances, 65 by QZ13 (A), 59 by ZE55 (B) and 65 
by XJ45 (C), which are supposed to exemplify the range of variation in the most 
relevant aspect, namely the grouping of prosodic words to phonological phrases. 
The structures A-C demonstrate that at the level of prosodic structure, the *main" 
variant is not very different from the *phrase accentuation" variants. The only real 
difference lies in the grouping within the comments, and here QZ13's and ZE55's 
versions stand together against that by XJA5, cutting across the divide between 
*main" and *phrase accentuation" variants. For QZ13's 65 and ZE55's 59, I have ana- 
lysed the verb together with the preposition (está al frentelatrás/delante) as phrased 
into one phonological phrase, and the prepositional complement (de una/la casa 
pequefia/grande) into another. This is because for the main variant it is suggested 
by the scaling in QZ13's 65 and 52 (cf. Figures 95 and 78, respectively): the peak on 
está is slightly lower than that on al frente (or chupando in 52), while the first peak 
of the prepositional complement is lowered with respect to both of them, hinting 
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at a further level of embedded scaling. For ZE55, it is suggested straightforwardly 
by the presence of a pitch accent on the preposition (atrás/delante) but not on the 
verb (just as on the following adjective but not its preceding noun), suggesting their 
being grouped together (cf. Figure 99). No evidence for any such differentiation is 
found in XJA5's 65, where both comments are entirely phrased into one phonolog- 
ical phrase (and then directly into a minimal IP), as evidenced by there being only 
one prominent pitch movement over that whole stretch, the final rise (cf. Figure 
107). I do not necessarily want to claim that the more complex phrasing as analysed 
for QZ13’s utterances here is valid for all utterances in the “main” variant or for all 
except the ones by XJ45. I think it is quite likely that some of the other speakers also 
produce the more simple grouping. In this sense, the structures A-C just serve to 
exemplify these two options. 

The treatment of the relationship between prosodic word and phonological 
phrase deserves some justification. Another analytical option would have been 
to take the prosodic word as the domain at which pitch accents are assigned, as 
is proposed by Hellmuth (2007) for Egyptian Arabic, which realizes pitch accents 
on nearly every content word (similar to the *main" variant here). However, for 
the *phrase accentuation" variant, the consequence would be either to treat the 
much larger units in which only a single pitch event occurs (the entire comments 
in the case of XJ45, for example) as a single prosodic word, lumping together 
several content and function words in any order, or to say that pitch assignment 
happens at the prosodic word level only in the *main" variant, but at the phono- 
logical phrase level in the *phrase accentuation" variant. Both options do not seem 
appealing, the first because we could not then associate the L tone in the *phrase 
accentuation" variant with the stressed position of a non-final prosodic word 
without giving up on culminativity or making the prosodic word again recursive, 
and because a prosodic word would then consist of an accentable content word 
plus clitics in the *main" variant, but of several accentable content words plus 
clitics in the *phrase accentuation" variant, without any good reason. The latter 
option, on the other hand, seems (at least to me) inelegant and somewhat ad-hoc. 
What is more, for other varieties of Spanish that normally also pitch accentuate 
nearly every word (i.e., like the *main" variant here), it has been shown that deac- 
centuation, in the sense of an accentable content word not being realized with its 
own pitch accent, does happen in certain contexts, but it is strongly dispreferred 
when that would mean deaccenting an entire phonological phrase (Rao 2009: 18), 
and that deaccented words in the majority still maintain longer duration and/or 
higher intensity on the stressed syllable (Ortega-Llebaria & Prieto 2011; Torreira 
et al. 2014). This supports the established view that stress is a property at the level 
of the prosodic word, distinct from pitch accent as a property of the phonological 
phrase, and that this difference particularly holds out in such contexts where indi- 
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el zorrillo pequeño está al frente de la casa grande y el zorrillo grande está frente de la casa pequeña 


Figure 109: Prosodic structures for QZ13 ELQUD ES 65 (A), ZE55_ELQUD_59 (B), and Xj45 ELQUD 65 
(C), intended to exemplify the range of variation in these structures for the Spanish *double 
topic" constructions, including the *main" variant (A) and the different variants within the *phrase 


accentuation" variant (B, C). 
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vidual words are not accented. Stress is culminative at the level of the prosodic 
word as defined in section 3.3.2: at least one and only one syllable must be stressed 
in a prosodic word. For accent we can atleast assume a minimality condition that 
there must be at least one pitch accent in a phonological phrase, but possibly more 
than one. I would like to follow this view here and therefore assume that also in 
the *main" variant, pitch accent assignment happens at the phonological phrase 
level. 

This leads to the analysis of what is the main point of difference between the 
“main” and “phrase accentuation" variants. At the level of the phonological phrase, 
tones are assigned. Their identity is determined by the utterance type, so that based 
on the evidence gathered in this work, I propose that for declarative IPs, the basic 
tonal sequence is LH. How many LH sequences are assigned per phonological 
phrase is determined by reference to the metrical structure, and I propose that this 
is the main difference between the *main" and *phrase accentuation" variants. In 
the *main" variant, the relevant grid level is that of the prosodic word: the number 
of grid marks at that level determine how many LH sequences are provided for 
each phonological phrase. In the *phrase accentuation" variant, the level selected 
by tonal assignment is the next higher, that of the phonological phrase, making 
pitch accent culminative at that level in this variant in addition to being minimal. 
Hence, only one LH sequence is assigned for each phonological phrase in the 
*phrase accentuation" variant, while potentially several are assigned in the *main" 
variant. How these sequences are then aligned and associated is a difference again 
cutting across the divide between the variants: for the *main" variant as well as for 
ZE55, the most highly ranked association position for the H is the strongest metri- 
cal position (and then the subsequent next-strongest metrical positions, i.e. all the 
stressed syllables, in the “main” variant), while for NQ01 and XJ45, alignment with 
the edge of the phonological phrase itself is ranked higher (see the discussion in 
section 5.2.4). Figure 110 gives a schematic overview over this difference in tonal 
assignment. 

The first topics in the *phrase accentuation" variant take up a somewhat special 
role here. As already discussed, in the *phrase accentuation" utterances, they are 
produced with prominent pitch movements on both prosodic words (the noun and 
the adjective), unlike the second topics, that are produced within a single pitch 
movement. As can be seen from Figure 109 (B & C), this has been analysed as the first 
topic consisting of two phonological phrases, one for each prosodic word, while in 
the second topic, the two prosodic words are contained in one phonological phrase. 
This is similar to the “initial boost" on the noncontrastive noun of the first topic that 
we saw also in the pooled data. This would further support the suggestion that there 
is a functional equivalence between scaling-based cues and alignment-based cues 
to prosodic phrasing. 
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Figure 110: Difference in tone assignment between the „main“ and the „phrase accentuation“ 
variants. Phrase accentuation variant I is the one exemplified by ZE55, variant II that by NQ01 & XJ45. 


Table 14 summarizes the different cues for the prosodic structure across the vari- 
ants discussed. The cues by which intonation separates the double-topic utterances 
into chunks corresponding to the information structure are given, ordered after 
variants and speakers. Additionally, the table lists the tone assignments and associ- 
ations at different levels of the prosodic structure. It is claimed that effectively very 
similar prosodic and metrical structures underlie all the variants here, and that 
what the table summarizes are just the different cues by which these structures are 
expressed in the signal. The structures are so similar because the prosodic phrasing 
in all cases serves the function of cueing information structural partitions: in the 
responses to the more complex items (43, 44, 52, 59, 65), material corresponding 
to the two topic domains, and each of the two comment domains, is phrased as 
a minimal IP. Each of the two main parts comprising a topic-comment sequence 
corresponding to an individual assertion answering sub-QUD is mapped to an IP 
one level below maximal (max-1), and the entire complex assertion answering the 
superordinate QUD is mapped to a maximal IP domain. Table 14 summarizes the 
evidence that all of these levels are signaled prosodically in all variants observed, 
although the means by which they are signaled vary. The responses to the less 
complex items 14 and 26 are disregarded here. As discussed in section 5.2.3.7, it 
seems likely that constraints against too complex prosodic structures win out in 
them because of their relative shortness. 
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In the upper part of Table 14 (white background), the phonetic cues that are 
used in the different variants to signal the prosodic structure(s) given in Figure 
109 and discussed just before, are given and compared. Since the table attempts 
to summarize a sizeable chunk of the results of this chapter, it necessarily uses 
somewhat compressed language. It should therefore be read as follows: phrasing 
refers to prosodic grouping of segmental material, in the case of prosodic cons- 
tituents at or above the minimal IP this might be made observable via bound- 
ary tones (in the case of NQ01 and XJ45 also at the level of the phonological 
phrase); in the case of phonological phrases, it mostly occurs via an alignment 
of the locally most prominent constituent (the most prominent prosodic word in 
the phrase) with the right edge of the phonological phrase, which in turn might 
be visible either via upstep on the peak of that prosodic word (sometimes in the 
“main” variant) or by having a pitch peak realized only on the stressed syllable 
of that prosodic word, but not others, in the phrase (the *phrase accentuation" 
variant). In a particular case, phrasing also refers to the systematically differing 
realizations (with two vs with one pitch peak) of the two topics in the *phrase 
accentuation" variant. Scaling is of course also seen as cueing phrasing in the 
sense of prosodic grouping and separation, but since it is of particular interest 
here, it is treated separately in the table. Scaling as systematically employed dif- 
ferences in relative pitch height is seen as reflecting pitch height orientation of a 
prosodic constituent along an abstract reference line and is expressed either via 
the relative height of pitch accent peaks (in the *main" variant) or of boundary 
tones (occasionally in the *phrase accentuation" variant). 


Table 14: Prosodic means employed for the separation of the utterances ELQUD ES 43, 44, 52, 
59 and 65 into parts corresponding to their discourse and information structure (upper part, 
white background), and tone assignment, alignment and association behaviour (lower part, grey 
background), separated according to variants. 


*main" variant *phrase accentuation" 
ZE55 NQ01 XJA5 

Separation into Scaling; phrasing via L Phrasing via Phrasing; Split 
part I and II phrasing; boundary tones; highest scaled topic 1 vs. unsplit 

boundary Split topic 1 vs. boundary tones;  topic2 

tones unsplit topic 2 Split topic 1 vs. 

unsplit topic 2 

Separation Scaling (also Phrasing via H Phrasing via next- Phrasing via 
between topic boundary boundary tones highest scaled boundary tones 
and comment tones for some boundary tones (also scaling) 


speakers) 
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Table 14 (continued) 


“main” variant “phrase accentuation” 


ZE55 NQ01 XJ45 
Separation Scaling Phrasing into Boundary tones 
within topic (downstep separate d; of $; Alignment 
and comment X within minimal Alignment of of strongest W 
IP, upstep strongest w with with right edge 
on last w); right edge of d; of $ 
Alignment of Scaling (upstep on 
strongest w with ` last o in topic 1 and 
right edge of 4 comment 1) 
Q-level tones LH for each w LH once per 4 
ing 
H associated H associated as H* H aligned with H aligned with 
as H* with o' with o' in strongest — rightedgeofd;L ^ right edge of b or 
in each w; L as wing;Lasleading as leading toneor associated as H* 
leading tone tone or associated associated with o' with o' in strongest 


(unassociated) 


to an available 
prefinal o' 


in strongest w in $ 


w in Ó; L as leading 
tone or associated 
to an available 
prefinal o' 


IP-level tones 


HorL 


aligned as L96 
or H% with right 
edge of IP 


aligned as L96 or 
H96 with right edge 
of IP 


L aligned as L96 
with right edge 
IP if it wins out; 
H aligned as H% 
with right edge 
of IP 


L aligned as L% 
with right edge 

IP if it wins out; H 
aligned as H% with 
right edge of IP 


We can see from the table that the use of scaling decreases when moving from left 
to right: in the “main” variant, represented in the first column, scaling is present 
as a cue for the prosodic separation and grouping of information structurally rel- 
evant units in all three rows, as evidenced by the analysis of the pooled data in 
section 5.2.3. It separates successively smaller levels moving downwards in the 
table: the separation between the two main parts, corresponding to the two sub- 
QUDs; between the topic and comment in each part, and between the different con- 
stituent parts within the comment. For the speakers of the “phrase accentuation” 
variant, scaling is used for fewer of these separations: ZE55 uses it only within the 
comment; NQ01 only for the separation between the main parts and the topic and 
comment, but applied to the boundary tones that are also present at the edges of 
these parts; XJ45 in his “phrase accentuation" utterances, at the very right, does not 
scale his boundary tones differentially (but perhaps the register level of the L tones 
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in each subpart, which is not covered here). However, the table also shows that 
these speakers use other means to make up for this lack of scaling: phrasing in the 
form of different boundary tones associated with the right edges of phonological 
phrases as well as the prosodic levels above it, together with alignment and the 
phrasing difference between the first and second topic, does all the work for XJA5's 
*phrase accentuation" mode, it does part of the work for the other two *phrase 
accentuation" speakers, but it is much less used in the *main" variant. In the middle 
columns (ZE55 and NQ01), both phrasing by boundary tones and scaling is used to 
a certain degree redundantly, cueing the same boundaries, but overall, it seems 
warranted to say that some kind of trade-off between these cues takes place. In 
addition, a *vertical" variation (i.e., moving up or down along the axis of larger or 
smaller information structural constituents) can clearly be observed that interacts 
with the horizontal one: in the *main" variant, boundary tones mostly only sepa- 
rate the highest levels between the two main parts (except for speakers OZ14 and 
maybe ZZ24); for the *phrase accentuation" variant, also the level below, between 
topics and comments; for the two speakers that produce the phonological phrase H 
tone as an edge tone, NQ01 and XJ45, even at the lowest level. However, even NQ01, 
who uses scaling for the boundary tones of the levels of minimal IP and upwards, 
does not use scaling with regards to the boundary tones at the level of the phono- 
logical phrase. This variation in cues employed for prosodic phrasing is somewhat 
reminiscent of that observed in Gabriel et al. (2011) for Argentinian Spanish. They 
also observe that boundary tones do not always get realized as posttonic pitch rises 
and peaks (continuation rises), but instead also sometimes as plateaux, pitch reset, 
or preboundary upstep. However, there are also some differences to their study. For 
the cases they discuss, they *assume an underlying intermediate phrasal boundary 
tone H-, which can be phonetically realized in different ways" (Gabriel et al. 2011: 
162). It is this underlying H- tone which they explicitly make out to be responsi- 
ble for *modifications of the scaling of the pitch accents located in the close sur- 
roundings of the boundary" (ibid.). However, in the data analyzed here, there are 
also sometimes clear cases in which this explanation does not suffice. As already 
discussed in section 5.2.2, the binary difference between presence or absence of a 
boundary tone at a boundary is not enough to explain the systematically observ- 
able scaling differences, which do not follow a simple separation between “high” 
and “low”, but instead reflect a more complex structure of embedded reference 
lines, as we have seen throughout. Additionally, scaling differences can be observed 
whether a clear boundary tone (in the sense of a posttonic pitch peak at a break) 
is also present or not, and for NQ01, the height of the boundary tone peaks them- 
selves differs with regard to whether they occur at the break between phonological 
phrases or between higher-level prosodic constituents. I therefore argue that here, 
scaling is not merely a variant realization of a boundary tone, but via its less cate- 
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gorical nature can convey additional and independent cues to prosodic structure. 
Scaling and boundary tones can occur complementarily or together, but it is not 
the boundary tone that is underlying, but the prosodic structure itself. The other 
difference to the findings in Gabriel et al. (2011) is that I make this variation in 
cues out to be largely systematic between speakers. Although the amount of data 
does not really suffice to say this with more certainty, in principle it might point to 
an inter-speaker variation in the use of these intonational cues within the speaker 
community that could be similar to that found by Niebuhr et al. (2011) between 
pitch accent “shapers” and “aligners” for German and Italian. 

In the lower part of Table 14 (grey background), the tone sequences provided 
at each level of prosodic structure and their association properties are given. Here, 
the most essential shared property emerges: each tonal sequence assigned at the 
level of the phonological phrase, whether realized as pitch accents on each proso- 
dic word, only once per phonological phrase, or partially as boundary tones, can 
be analyzed as LH, which was also by far the most prevalent pitch accent identi- 
fied for less complex declarative utterances in section 5.1.1. I propose that this LH 
sequence is an essential part of what makes up a *declarative" in the Spanish of 
these speakers. I follow Beckman & Pierrehumbert (1986); Hayes & Lahiri (1991); 
Heusinger (2007) and others in assuming that the intonational phrase is the level 
at which a somehow ‘full’ tune is associated to a text; like Hayes & Lahiri (1991) 
for Bengali as well as in the works of the Grup d'Estudis de Prosódia and others on 
Spanish, collected in Prieto & Roseano (2010, 2009-2013); Hualde & Prieto (2015), 
I assume that a part of this tune is determined by the utterance type of the IP. LH 
is also the tone sequence taken to be associated with declaratives in most works 
on Spanish. The analysis is therefore that the information that the sentence type 
is *declarative" is encoded at the level of each IP by making this the tone sequence 
that is minimally produced in each phonological phrase whenever a tone is requi- 
red. Note that the tone sequence associated as a pitch accent in the variants where 
there are pitch accents in this analysis does not change between those constituents 
that form part ofthe topic and those that form part of the comment, as is proposed 
by Steedman (1991, 2000) for English. The difference between topics and comments 
is here assumed to be conveyed only via a probabilistic relation between prosodic 
structure and information structurally relevant partitions, as detailed in section 
3.7.3. The fact that the prevailing tone sequence can successfully be identified to be 
LH, while its association and alignment properties (as pitch accents or boundary 
tones) vary systematically between the *main" and *phrase accentuation" vari- 
ants, can be seen as evidence for assuming an autosegmental tonal tier at which 
tones are not associated (in line with such work as Grice et al. 2000; Gussenhoven 
2000a, 2004; Torreira & Grice 2018, see also the discussion in Ladd 2008: 285-288), 
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that association comes later or separately??? and that even though different varie- 
ties and contexts have various effects on their association and alignment, there is 
some essential unity to the pitch accents nearly always analyzed as forming part of 
Spanish declaratives, namely that they consist of precisely this tone sequence, LH, 
whether it is associated as LH*, L*H, or L+>H* (cf. Hualde 2002; Gabriel 2007 for 
similar proposals for unity). 

I assume a recursive iP/IP domain here for these Spanish utterances for the 
same reasons that have been brought forth for them for other languages in section 
3.6. When there is evidence, as there is here, for a number of prosodic groupings 
that are only differentiated via the varying strength of a continuous cue, i.e. scaling 
here, but not via a different tonal make-up or other discrete properties, then it is 
most parsimonious to take these groupings as a recursive instantiation of a pros- 
odic domain. Incidentally, it is interesting that it is the iP/IP domain which shows 
evidence of being recursive, since arguments for or against the existence of inter- 
mediate phrases not only in Spanish often hinge on whether "lesser" instantiations 
of IP-level boundaries should be taken as evidence of a separate category or not. 
This problem would not arise anymore with the assumption of a recursive IP cate- 
gory. Assuming a recursive IP domain does not mean that a recursive IP structure is 
always present in all utterances. As Féry (2017: 78) also argues, it is only utterances 
exceeding a certain length and complexity that show evidence of recursive phra- 
sing, as we also saw in the analysis of the responses to items 14 and 26 in compari- 
son to the longer ones. 

The proposed recursive structure is of a kind that has been called compound 
phrasing in Frota (2000, 2012, 2014) and compound prosodic domains in Ladd 
(2008). These are *balanced" recursive prosodic structures in the sense of van der 
Hulst (2010: 319-320), where a recursive prosodic constituent may dominate two 
constituents of the same category (one level below, (69)a), but not one of the same 
category and one of the one below ((69)b). 


233 I do not want to imply any actual temporal sequence in a processing model here. The point 
(also in the works cited) is just that it makes sense to conceive of tone sequences for a certain pro- 
sodic domain as separate from their concrete associations, because this allows for a variety of gen- 
eralizations on empirical observations otherwise not possible. This is not very far from the original 
concept of tones as autosegments (Goldsmith 1976), enriched by an explicit formulation about at 
which level in the prosodic hierarchy tone sequences are provided. Gussenhoven (2004: 146-147), 
working in an OT framework, also reveals a conception of tones having an existence independent 
of their associations, when he argues that faithfulness constraints preserve “phonological sub- 
stance", amongst which he counts tones, but not *a relation, like an association or an alignment". 
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(69) a. balanced/compound recursive structure 
[ lren 
[ ] IP_n-1 [ ] IP_n-1 
b. unbalanced recursive structure 
*[ lpn 


[ Jo [ lem 


As discussed in section 3.6.2.4, Frota proposes a recursive IP for European Portu- 
guese on the basis of sandhi and intonational phenomena, because the boundary 
cues are of the same type at each edge (fricative voicing, preboundary lengthening, 
boundary tones), but those that are gradient are notably stronger at those bound- 
aries with higher-level edges. Because the boundaries only differ by phonetic 
strength, but not by type, Frota argues that the recursive analysis is superior to one 
involving two different prosodic units, e.g. the IP and the ip (Frota (2014: 12-14)). 
This is the argument also used here. The structures analysed here as recursive are 
also ofthe type of ((69)a), where no IP (or any other prosodic unit of any category) is 
ever analysed as dominating units of different categories, but always ones of equal 
category (either of the same category, but a level below, or of the category immedi- 
ately below). Only NONRECURSIVITY is violated in such structures, while the other 
components of the SLH are all kept intact (cf. section 3.6.1). With such structures, 
the categories of the prosodic hierarchy stay strictly ranked, and prosody remains 
essentially flatter than syntax, but proliferation of prosodic categories is somewhat 
hemmed in and phenomena such as boundaries that are of different degree but the 
same type can be elegantly accommodated (Ladd 2008: 298-299). In other words, 
exactly those empirical observations that we have made throughout this chapter 
can be theoretically accounted for without all hell breaking loose. 


5.3 OT-Analysis of the “main” and “phrase” accentuations 
and their relation to Quechua 


This section develops an OT-analysis for the main and phrase accentuation vari- 
ants of Spanish declarative utterances as described in the previous sections. In 
section 5.1, we saw that the phrase accentuation is likely influenced by information 
structure, whereas in section 5.2 it was also shown to be a variant characteristic of 
individual speakers. Here, only the phonological aspects of the difference between 
the variants will be treated. Although the difference between the two is the main 
strand of variation observed in the data here, concentrating on it necessarily leaves 
out other interesting aspects. However, as will become clear from the analysis, 
similar prosodic properties are at play that determine the difference between these 
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Spanish variants here as the ones that are also relevant for the differentiation 
between the Quechua intonational variants described and analysed in chapter 6. 
In this sense, the OT-analysis in both sections constitutes a proposal for how the 
prosodic space of possibility available to the bilingual speakers of the Huari speech 
community comprising both Spanish and Quechua can be delineated via what I 
suggest are some of its most important dimensions. Such a notion has to be consi- 
dered carefully and formulated with a high degree of differentiation. Note that the 
analysis for Quechua proposes that word stress plays only a marginal role; mostly 
what is responsible for the characteristic pitch movement in Quechua phrases is 
therefore analysed as phrasal tones aligned with the edges of phonological phrases 
(see sections 6.1, 6.3). In contrast, for the Spanish phrase accentuation examples 
we have seen that at least in some cases, pitch peaks are realized on the stressed 
syllable of the last word in a phrase, meaning that the Spanish word stress is in fact 
preserved. In addition, the tonal sequence assigned at the level of the phonologi- 
cal phrase for Quechua declaratives is LH, HL, or LHL, depending on the type of 
phrase, whereas the analysis for Spanish has shown that it is LH, also in the *phrase 
accentuation" variant. Any theory proposing some kind of intrusive implementa- 
tion of “Quechua” grammar wholesale into what is otherwise “Spanish” grammar 
would therefore fall short of the facts here. The *phrase accentuation" variant is 
neither just *Quechua in Spanish" nor “Spanish in Quechua". The analysis will 
instead demonstrate that it is possible to account for the observed similarities and 
differences in a granular fashion, allowing for intermediate differentiation.?** As 
stated in section 4, the OT-analyses in both main chapters is intended to tackle the 
third group of research questions (36) that ask about which prosodic properties of 
Huari Spanish and Quechua are specific to each language, and which are shared. 
I use OT for this, in particular in answer to subquestion (36)b, because an OT-ana- 
lysis describes the observed prosodic behaviour as a ranked set of separately iden- 
tifiable constraints. It thus allows for a fine-grained analysis of what the minimal 
structural differences between individual observed variants are, and can thus lay 
bare how far apart or close together the different variants are along several dimen- 
sions in a prosodic possibility space independent of which language they belong to. 


234 Note that the version of OT used here is not really fine-grained enough for example to capture 
the continuous scaling differences between a pitch accent that is “fully accented", “slightly com- 
pressed”, “very compressed", and “deaccented”. I take it for granted that a fully realistic model of 
both the phonetics and phonology involved here would require recourse to articulatory, percep- 
tual, and neuronal explanations and specifications, e.g. as in Katsika (2012); Katsika et al. (2014); 
Tilsen (2013, 2019); Boersma et al. (2020). However, I think that the OT analysis here can still de- 
monstrate those relations between the variants that are most relevant, in part precisely because it 
abstracts away from many of the more continuous aspects. 
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5.3.1 Variation in tone distribution for the phonological phrase and the 
intonational phrase 


Some assumptions to start with: in non-tonal languages such as Spanish, the tones 
that are associated with stressed syllables change their identity, if at all, only for the 
expression of postlexical meaning and not according to lexical specification. So, no 
tonal specification takes place at the level of the prosodic word, and instead only 
at the level of the phonological phrase and the intonational phrase. I take the tone 
sequence LH for Spanish declaratives to *belong" to the phonological phrase level 
(with the possible number of tone sequences per PhP differing between *main" and 
*phrase accentuation"), and further boundary tones to belong to the intonational 
phrase level. By saying that tones *belong" to the PhP I mean that they can be taken 
to be indexed, T„ and thus that one set of association and alignment constraints 
refers to them, while another could refer e.g. for tones belonging to the IP, diffe- 
rently indexed (T). This is the practice also followed in Gussenhoven (2004). At the 
same time I assume with some of the literature that tones must be unassociated at 
some level of the phonological representation, and that alignment and association 
are processes separate from tonal identity. Thus at the phonological phrase level, 
tones are provided whose identity is determined by such meaning-related factors 
as utterance type (or e.g. modality). These tones are not directly associated to a spe- 
cific position and their number is determined by the number of relevant prosodic 
units. As argued above, I assume that the relevant unit in the case of the “main” 
variant in Spanish is that of the prosodic word: for each prosodic word, an LH tone 
sequence is assigned at the level of the phonological phrase. For the *phrase accen- 
tuation" variant of Spanish (and for Quechua), the relevant unit is the phonologi- 
cal phrase itself: for each phonological phrase, an LH tone sequence is assigned 
for Spanish, and an LH/HL/LHL sequence for Quechua. For easy reference, I will 
call those tones that are assigned once per prosodic word in the *main" variant 
and once per phonological phrase the “minimal tone sequence”.?*° In the “main” 
variant, several such minimal tone sequences can then make up the tone sequence 


235 Cf. the observations by Ladd (2008: 285-287) on the obligatoriness of at least one pitch accent 
in a phonological phrase (intermediate phrase in his terms), associated with either the strongest 
metrical position in that domain or an edge, and on his concurring assumption that the other, 
pre-“nuclear” pitch accents, if they occur, are somehow subordinate to the main pitch accent, and 
that their tonal make-up is “a single linguistic choice”, while their number depends on the metrical 
structure and how tones are assigned to positions in it. The claim is that whether there are one or 
four prenuclear pitch accents, they will all consist of the same tone sequence. Based on the analysis 
in the previous sections, I take the tone sequence LH in Huari Spanish to also be the same for the 
final and obligatory nuclear accent, so that it is fully general. 
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for a phonological phrase, whereas in the *phrase accentuation" variants only one 
occurs per phonological phrase. In all variants, at the level of the IP, additional 
boundary tones might then be added to make up the IP tone sequence. Thus, the 
first main point of variation to be modeled is concerned with how many minimal 
tone sequences are allowed per phonological phrase. This is not totally straight- 
forward. I base many mechanisms of the analysis on the intonational OT-model 
by Gussenhoven (2004). Although Gussenhoven (2004: 145) asserts that “[b]roadly, 
alignment constraints are responsible for creating the underlying tone string", his 
subsequent discussion of tonal alignment constraints is not concerned with the 
problem at hand here, namely the amount of tones that are available for associ- 
ation and alignment within a given prosodic domain. In his discussion of English 
compounds, he does cover the distribution of pitch accents, but it is a somewhat dif- 
ferent problem: there, the problem is determining where in units of similar size the 
main stress falls. Consequently it is a question of determining where the one pitch 
accent in such a unit will occur, not how many pitch accents can occur in a unit. 
That is to say, the English compound stress problem is concerned with how infor- 
mation from morphology can work on metrical structure so that the main stresses, 
at which the pitch accents occur, shift (cf. (70)), whereas in our case, the metrical 
structure between the main and phrase accentuation variants is the same in the 
relevant points, but the distribution of pitch accents per domain changes (cf. (71)). 


(70) (from Gussenhoven 2004: 276) 
a. TOM Paine Big BAND (i.e. a Big Band led by Tom Paine) 
b. Tom PAINE Street BLUES (i.e. blues induced by Tom Paine Street) 
c. TOMcat-free ROOF 


(71) differencein pitch accent distribution between main and phrase accentuation 
a. deuna casa pequefia (QZ13 Elqud ES 65) 
XXX XX XXX 
x X X 
X 
LH* LH* LH* 
b. dela casa pequeña (ZE55 Elqud ES 59) 
X XX X XX X 
X X 
X 
LH* 


Equally, he uses the example of the different respective orders of the intonational 
phrase boundary tone and the lexical tone in Venlo Dutch vs. Roermond Dutch (cf. 
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Gussenhoven & van der Vliet 1999 and Gussenhoven 2000a, respectively) to argue 
that phonological alignment? and association are two independently needed pro- 
cesses to describe tonal behaviour, and to claim that *since alignment constraints 
are generally used as the mechanism to determine the order of morphemes, it 
would be undesirable to devise a different one just for tones" (Gussenhoven 2004: 
150). In (72)a, the right-alignment of the lexical H tone (given as H;,), associating 
with the final mora of the intonational phrase, takes precedence over that of the 
L boundary tone (given as Lyoung), while in (72)b, a different ranking results in the 
reversed tone order. However, this case cannot simply be transferred to our situa- 
tion: the H and L tones in the Dutch example come from two different sources: one 
is lexical, the other is an intonation phrase-level boundary tone, while in our case, 
the order LH for our tones is again a “single linguistic choice", that of choosing one 
(intonational) morpheme over another. To come back to the segmental comparison, 
using alignment for the determination of the LH sequence would be like saying 
that not only the order of suffixes is determined by alignment constraints, but also 
their internal make-up. In fact, Gussenhoven (2004: 151, footnote 7) points out that 
the different order of the tones in the two dialects of Dutch is actually coming from 
a *different sequencing of the tones", confirming that tone sequence is not purely 
determined by alignment, but still leaving the question unanswered that is perti- 
nent here. 


(72) Adapted from Gussenhoven (2004: 150) 
a. Representing Roermond Dutch (schematized) 


HHH] 


Lbound Hies 
b. Representing Venlo Dutch (schematized) 
Kuu] 


Hie Lbounc 


The fact that Gussenhoven (2004) does not seem to be concerned with the number 
and identity of the minimal tone sequences per a given prosodic domain as an 
issue for the constraint-based analysis suggests a seemingly easy solution for the 
problem here: these can simply be taken to be part of the input and are therefore 
not the result of constraint interaction, and we would then only need faithfulness 


236 Not meant here is phonetic alignment, cf. the introduction to the terms in section 3.5. 
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constraints to keep them the way they are. In a way, this does make sense: the tonal 
processes that Gussenhoven addresses and proposes to treat with constraints for 
alignment and association are “phonological adjustments” (2004: 145) that take the 
properties we are here concerned with as previously established. However, this 
is problematic: the input is usually taken to be an underlying lexical representa- 
tion (Kager 1999, who is only concerned with segmental phonology, takes the main 
purpose of faithfulness constraints to be the preservation of lexical contrasts), but 
the tones in an intonation language are postlexical. Alternatively, they could be 
part of a postlexical constraint-based process that takes place before alignment and 
association (intuitively, this seems very plausible). But then, and not just for our 
purposes here, it would still be important to understand the constraint interaction 
in this part of the process, since the number of tones per prosodic domain is clearly 
subject to systematic variation, not just in the Huari data, but in intonational typol- 
ogy in general??? It should therefore be undertaken to be analysed via constraint 
interaction, and not just be seen as something given. However, even though such an 
ordering of processes seems quite likely and appealing, it goes against the general 
assumption??? of parallelism in OT, meaning that all structure really is evaluated 
(and built up) in parallel. It seems, however, as if the strictness of this assump- 
tion has been somewhat relaxed in practice, also in Gussenhoven (2004). A further 


237 As „frequency or domain of pitch accents/AP/word tones”, it forms criterion iii for determining 
the macro-rhythm of a language in Jun (2014b: 526) prosodic typology: *[llanguages where every 
word receives a pitch accent or AP/word boundary tone are more macro-rhythmic than those with 
less or more frequent pitch accents or AP/word boundaries per word". This is used to group Span- 
ish, with a pitch accent on nearly every content word, as a language with strong macro-rhythm 
apart from English, where pitch accents occur less frequently per phonological phrase, and which 
has medium macro-rhythm. Cf. Cole et al. (2019: 115) who repeat this grouping. 

238 Kiparsky (2015: 3) calls it „the central principle of OT“, but argues (11-12) that any kind of 
modularity assumption, also one separating between syntax and phonology in grammar, is strictly 
speaking a violation of it, and that a less radical alternative, modularization or stratification, has in 
fact been explicitly practiced in a number of works, including such early ones as McCarthy & Prince 
(1995). Prince & Smolensky (2004 [1993]: 7, 25) assert that the majority of their analyses is based on 
the assumption of parallelism, but also admit that deciding whether parallel or serial approaches 
should be favored is “a challenge of considerable subtlety”. Gussenhoven (2004: 276—278) treats the 
mechanisms responsible for the different stress patterns of (70) as the result of a constraint-based 
version of Lexical Phonology (Kiparsky 1982), which then *present themselves to the postlexical 
grammar of English". That is to say, he tacitly also assumes some kind of modularity or serialism 
within morphophonology, with each module being governed by OT-like constraints, but obviously 
with different constraint rankings at each of them. This is the essence of what Kiparsky (2015) more 
explicitly proposes as “Stratal OT". A different modification of OT that is also intended to overcome 
the problems that arise from a strict application of parallelism is Harmonic Serialism (McCarthy 
2016), where the results of a one-step-only modification by Eval are fed back as input to Gen in a 
loop, until no further changes are effected by the constraints (convergence is reached). 
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option, in any case, is just as problematic: if the tones are actually assigned by some 
constraint on the input, then it will not be possible to have any faithfulness con- 
straints act upon them, since faithfulness constraints relate what is in the output 
with what is in the input. In fact, Gussenhoven (2004: 146—147) explicitly assumes 
that tones are part of the input.??? In the following, I adopt an approach similar to 
the one Kiparsky (2015) proposes. I assume some kind of serialisation or modulari- 
zation whereby the output of one process (tone sequence distribution) becomes the 
input of another (alignment and association) — because this is crucial for the differ- 
ence between main and phrase accentuation. The tones that are aligned and asso- 
ciated have to be the output of a previous process of constraint evaluation, because 
they are clearly postlexical, and not assigned by markedness constraints. One way 
to arrive at different amounts of minimal tone sequences in the different variants 
would be to assume the maximal number of them in the input and then have a 
standard faithfulness constraints like MAXIO(T) (73) preserve them for the *main" 
variant, while for the *phrase accentuation" variant, a constraint that deletes all 
minimal tone sequences that do not belong to prosodic words that are heads at 
the level of the phonological phrase would have to be ranked higher, resulting 
in leaving only one minimal tone sequence per phonological phrase under pitch 
accent culminativity. However, here the question arises how the input *knows" how 
many minimal tone sequences (equaling the number of prosodic words) it should 
provide, if all structure really is evaluated (and built up) in parallel (i.e., this would 
have to run parallel to processes e.g. of cliticization that might result in altering the 
number of prosodic words). Making the basic assumption even more general opens 
up a path to pursue: we can assume that the input actually invariably provides 
a minimal tone sequence for each TBU contained in the domain at which tones 
are assigned (i.e. one tone sequence per mora or syllable in the PhP or IP), and 
that a family of markedness constraints that delete minimal tone sequences exist, 
whose ranking relative to MAXIO(T) would then determine how many minimal tone 
sequences there actually are in a larger domain such as the phonological phrase. 


239 "Faithfulness is expressed in terms of correspondences between the elements in the input 
and elements in the output (McCarthy and Prince 1995). By ‘element’ I mean any phonological 
substance, like a feature, a segment, a tone, an accent, a constituent like ,but not a relation, like 
an association or an alignment." At a later point, he contradicts this definition by proposing a faith- 
fulness constraint FArTH(Assoo) that is intended to preserve the difference between Accent 1 and 2 
in Swedish (Gussenhoven 2004: 216). It is quite telling that this slip occurs in relation with lexical 
instead of intonational tones, because it indicates an overall grammar model that is modular, in 
which certain (lexical) relations are established before others (postlexical ones) can follow. 
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(73) ` MAXIO (T): every tone in the input has a correspondent in the output (adapted 
from Gussenhoven 2004: 147) 


The desired output results can be produced using several constraints employed 
in Gussenhoven (2004). He has NoCoNTOUR?^? and OCP (obligatory contour prin- 
ciple) as a family of constraints ordered with respect to the level of structure at 
which each member applies. We also need another faithfulness constraint, IDENT, 
which preserves the identity of elements from input to output (cf. Gussenhoven 
2004: 147), but adapted specifically to preserve the minimal tone sequence (TS) for 
the phonological phrase. 


(74) OCP (T, a): no adjacent tones may be the same if they are within the same 
domain a. 


(75) NOCONTOUR (T, a): no adjacent tones may be different if they are within the 

same domain a. 

a. NORISE (T, a): no adjacent tones may be LH if they are within the same 
domain a. 

b. NOFALL (T, a): no adjacent tones may be HL if they are within the same 
domain a. 

c. NOCROWD (T, TBU): no two tones may be associated to the same TBU. 
(all adapted from Gussenhoven 2004: 146) 


(76) IDENT (TS, à): Disallows tone sequences in the phonological phrase that are 
not whole multiples of the minimal tone sequence. 


Appropriately different relative ordering of these constraints could then result 
in allowing only tone sequences to survive once per syllable (as in Mandarin-like 
tone languages), once per prosodic word (as in the *main" variant here and other 
varieties of Spanish), or once per phonological phrase (as in the *phrase accentua- 
tion" variant here as well as in Quechua and other languages, e.g. Bengali (Hayes & 


240 The NoCONTOUR group of constraints are motivated articulatorily in Gussenhoven (2004: 
146): overly complex (tonal) configurations are demanding articulatorily and therefore avoided. 
Although this looks at first sight like a clear-cut conflict between a perceptual, hearer-oriented 
tendency for dissimilation (OCP) and an articulatory, speaker-oriented tendency for assimilation, 
Boersma (1998: 416) also describes a tendency against the repetition of similar articulatory ges- 
tures, which he takes to be part of the articulatory functional motivation for the OCP. 
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Lahiri 1991)). The approach therefore has the merit of providing some typological 
comparability while not postulating any language-specific constraints.” 


Table 15: OT-tableau with the constraint rankings to arrive at the correct tone distribution for the 
“main” variant of Spanish. NOCONTOUR abbreviated to NoCo. 


(((LH) o(LH)o)u((LH)o(LH)o)w)g OCP NoCo IDENT MaAxIO OCP NoCo OCP  NoCo 
(Lo) (Lo) (154) (m m,w) (Lu) (L4) (14) 


b. (LL) (LL)o)u((LL)o(LL)c)ug — ****! 7* 6* 7* 

P c. ((L)s(H)gu((L)5(H)))u)e ERR ** P 
d. (OLH) allla) *| 6* * 

e. ((olL)c)ulolH)o)udg 6*! * 

f. (COLL) *| 6* * 

g. (HAD CCH) (Doug R Job ** sox 


Table 16: OT-tableau with the constraint rankings to arrive at the correct tone distribution for the 
“phrase accentuation” variant of Spanish. NOCONTOUR abbreviated to NoCo. 


(((LH) o(LH)o)u((LH)o(LH)o)w)g OCP NoCo OCP Noco IpENT MAxIO OCP  NoCo 
(Lo) (Lo) (Lu) (nw) (TS, 4) (m (L0) (T9) 


a. ((LH)&(LH)g (E H)s(EH)g)u)s ddl 6* 7* 
b.((LL)g(LL))4(LD)4(LL)))g — ****! 6* 7* 7* 

c. (L)s(H)9)u(D)5(H)9)u)a Æ FEIR ^ 
d. (COLHA) *l 6* x 
= e. ((Qs(L))9u(05(H)3)u)o 6* £ 

f. (COLO) *l 6* * 

g. ((H)s(L)u(H)s(L)3)u)o já on AER ^ 


In Tables 15 and 16, the tableaux showing the correct rankings for producing 
the *main" and *phrase accentuation" variant tone distribution, respectively, are 
given:??? the separate OCP and NoCONTOUR constraints are “inherently” ordered 


241 For many language varieties, it would of course also need to be enriched by some mechanism 
determining the tonal identity for each minimal sequence where it is not the same across the board 
(e.g. in many tone languages or when a nuclear tone sequence is different from a prenuclear one). 
242 Note that all tones resulting from this process are indexed as belonging to the PhP (Tọ). In 
Table 15 and Table 16, the indexation on the brackets only refers to the boundaries of the corres- 
ponding prosodic units. 
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(cf. Gussenhoven (2004: 146)): their least demanding versions (disallowing like or 
unlike adjacent tones within a mora or syllable) are ranked higher than their more 
demanding versions, since if a language does not allow like adjacent tones within 
a word, it will automatically also ban them within a syllable. They are, however, 
ranked so that they are interwoven with each other: the NOCONTOUR constraint for 
the syllable level is ranked below the OCP constraint for the syllable, but higher 
than the OCP constraint for the prosodic word, and so on. This will have the result 
that as long as nothing else intervenes, an output with no tones at all will be the 
optimal candidate, since in that way, both OCP and NoCONTOUR as defined in (74) 
and (75) are maximally satisfied (resulting in the elimination of candidates a and b 
in both tableaux). The crucial difference between Tables 15 and 16, and therefore 
between the tones output for the “main” and the “phrase accentuation" variant, 
respectively, is the ranking of the faithfulness constraints MAXIO(T) and IDENT(TS, 
$) with respect to these two markedness constraints: if they are ranked between the 
syllable- and the prosodic word-level versions of OCP and NOCONTOUR, as in Table 
15, then the candidate with exactly one LH tone sequence per prosodic word (c) will 
incur the smallest number of high-ranking violations, which is the correct result 
for the *main" variant of Spanish. If, on the other hand, they are ranked between 
the prosodic word- and phonological phrase-level versions of the two markedness 
constraints, then the candidate with one LH sequence per phonological phrase (e) 
will be optimal, just as in the *phrase accentuation" variant. The relative ranking of 
IDENT(TS, $) over MAXIO(T) ensures that candidates d and f are not selected. At this 
point it has to be noted that the constraint model here creates a somewhat more 
categorical picture than what is found in reality. As section 5.1.3.1 showed, utter- 
ances with *phrase accentuation" often do not fully eliminate all traces of prefinal 
pitch accents but merely drastically reduce their scaling relative to that of the final 
one. In reality therefore, the OCP- and NOCONTOUR constraints here should proba- 
bly thought of as having an effect that is gradual in this way. In the following I will 
leave this issue aside, but consider it a very worthwhile topic for future research. 
Having acknowledged these restrictions, the approach seems viable overall. It 
has a consequence that at first sight might seem somewhat strange for a language 
like Spanish: the most "faithful" output would be one in which there are LH-rises on 
each syllable or even mora, a situation that looks quite like an (admittedly strange) 
tone language. However, this is what should be the case if we take the idea seri- 
ously that it is mainly the different constraints rankings that create the typological 
differences between languages. And since the markedness constraints of OCP and 
NOCONTOUR will eradicate all of these tones up to the point where the two faith- 
fulness constraints intervene, this will result in a tonal optimization of either the 
word (*main" variant) or the phrase (*phrase accentuation" variant), which is the 
desired result. For the second leg of the variation journey we need to complete in 
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order to move from “main” variant Spanish to the phrase accentuation, and thence 
to Quechua, I now make the assumption of modularization (cf. Kiparsky 2015). 
Because the association constraints in Gussenhoven (2004) all take the identity and 
number of underlying tones to be a settled issue, they would run into problems 
when acting at the same time as the ones we have just discussed. I assume that part 
of the difference between the “main” and the “phrase accentuation” variant, that 
which concerns the number of minimal tone sequences provided, can be modeled 
using the approach detailed above and given in Tables 15 and 16. The tones pro- 
vided by them are then available to be aligned and associated according to cons- 
traints treated in what follows. This will reveal further variation, adding to the 
dimensions along which we can describe the variants to differ. 


5.3.2 Variation in association and alignment 


The second dimension of variation that this OT-analysis intends to cover has to 
do with the different association and alignment behaviours the different variants 
have. In the “main” variant and in ZE55's “phrase accentuation" variant of Spanish, 
the H tone of the LH minimal tone sequence is associated with the metrically stron- 
gest position in the prosodic word or the phonological phrase, respectively. Even 
though in ZE55’s case it is only the metrically strongest position in the phonological 
phrase and not the stressed syllables of non-final prosodic words (like in the *main" 
variant) that are associated with the H tone, there is no need for a difference in 
constraint ranking between the *main" variant and ZE55's one, as long as different 
numbers of minimal tone sequences are provided. The constraints we use here are 
all taken or adapted from Gussenhoven (2000a, 2004). Gussenhoven (2004: 149) has 
two groups of association constraints, one mandating that a certain TBU be associ- 
ated with a tone, and another mandating that a certain tone be associated with a 
TBU. I take TBUs to be syllables here. The constraints associating TBUs with tones 
are inherently ordered, according to Gussenhoven (2004: 149), in that a language 
that associates tones to all syllables will also associate tones to stressed syllables, 
but not the other way around. Thus the constraints (77)-(79) decrease in the specific 
demand they make on the input. NoAssoc is a constraint ranked somewhere in 
between these inherently ordered constraints, making sure that not all languages 
need to associate tones to all possible TBUs. NoCRowD makes sure that only one 
tone can associate with a given TBU. Mostly this is sufficient, but in section 5.3.3, 
cases will be discussed showing that NoCROWD does not in fact capture the pheno- 
mena related to tonal crowding realistically. 
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(77) (0), € T: associate a tone with the stressed syllable of the metrically strong 
prosodic word in a phonological phrase. 


(78) o'€ T: associate a tone with a stressed syllable. 
(79) o € T: associate a tone with a syllable. 

(80) NoAssoc: TBUs are not associated with tones. 
(81) ` NOCROWD: a TBU has only one tone. 


The constraints associating tones to TBUs are specified according to which tone 
they refer to. Here, non-subscripted tones refer to the tones assigned by the phono- 
logical phrase, and tones subscripted with a small t refer to tones assigned by the 
intonational phrase. 


(82) H->TBU: the H tone is associated with a TBU. 
(83 ` L> TBU:the L tone is associated with a TBU. 


In our case, a relative ranking of (0),€ T >> o€ T >> H > TBU >> NoAssoc >> o 
€ T >> L > TBU will ensure that stressed syllables in strong prosodic words (and 
preferably also all stressed syllables) are associated with a tone (either H or L), but 
not all syllables. Additionally, while if a H tone should end up in a position where it 
can only associate with an unstressed syllable, it will still do that, an L tone in the 
same position will remain unassociated. 

The alignment constraints (84)-(90) make reference to the position of a tone 
within a constituent or relative to another tone. Like the tonal association con- 
straints, they make reference to a specific tone, L or H, indexed as belonging to the 
level ofthe phonological or intonational phrase (in this section, all constraints refer 
to PhP tones; explicit indexation will be used in section 5.3.3 when IP-level tones are 
added). They are violated incrementally in a stepwise fashion: a right-edge align- 
ment constraint, for example, is satisfied if the tone it aligns is the rightmost tone in 
the constituent, and if its right edge is aligned with the right edge of the constituent 
(phonetically, if the pitch minimum or maximum takes place within the rightmost 
sonorant segment of the constituent, cf. Gussenhoven 2000a: 133); it incurs one vio- 
lation for each tone that interferes with its alignment and for each TBU further 
away from the rightmost one (cf. Gussenhoven 2004: 151). Each directed alignment 
constraint has a directly opposed counterpart (i.e. a constraint right-aligning a tone 
has a counterpart left-aligning the same tone); below, we will list only those that 
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are relevant for the analysis here. If their counterpart is not shown, it is taken to 
be ranked far below. If a tone A is associated with a particular position, it is also 
aligned there, i.e. it will block the full alignment of another tone B if that tone is to 
one side of it but seeks to align to the other and both the association constraint of 
tone A and the faithfulness constraint LINEARITY are ranked higher than B’s align- 
ment constraint (cf. Gussenhoven 2004: 155). Alignment constraints can also align 
tones with each other, or with a position in the prosodic structure, e.g. a stressed 
syllable (Gussenhoven 2004: 156—159). I follow Gussenhoven (2004: 156) in assum- 
ing that edges of prosodic constituents are not TBUs and tones can therefore not 
associate with them (cf. also section 3.5.2); alignment constraints thus suffice to 
position tones at constituent edges. However, if a tone is aligned with an edge, there 
is a free TBU available there (i.e., the final syllable) and if a constraint is active that 
demands for that tone to be associated, then it will not only align with the edge but 
also associate with the TBU (in our case this holds for H but not for L). 


(84) ALIGN (L, Lt;): Align (the left edge of) L with the left edge (of the leftmost 
syllable) of the phonological phrase. 


(85) ALIGN (L, Rt): Align (the right edge of) L with the right edge (of the rightmost 
syllable) of the phonological phrase. 


(86) ALIGN (H, Lt): Align (the left edge of) H with the left edge (of the leftmost 
syllable) of the phonological phrase. 


(87) ALIGN (H, Rt): Align (the right edge of) H with the right edge (of the rightmost 
syllable) of the phonological phrase. 


(88) ALIGN (H, 0): Align (the right edge of) H with the right edge of a stressed 
syllable. 


(89) LiNEARITY: The sequence of tones in the output is the same as in the input. 
(90) NOTARGET: A tone does not form a target (cf. Gussenhoven 2004: 155). 


NOTARGET functions in parallel to NoAssoc: it prevents a tone from satisfying two 
or more alignment constraints (multiple alignment) if it is ranked between the con- 
straints. Thus, ALIGN (T, o) >> NOTARGET >> ALIGN (T, Rt;) will ensure that even if T 
does align with o’, it will not form a stretch extending to the right edge ofthe phrase. 
The prevention of multiple alignment can also be achieved by restricting the align- 
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ment of one tone via that of another. The two faithfulness constraints LINEARITY 
and MAXIO(T) as well as NOCROWD are here ranked above all the other constraints. 


5.3.2.1 Main variant Spanish 

With the constraints introduced, we can now proceed to show that the same 
ranking of them produces the correct result for both the *main" variant and the 
*phrase association" variant that still forms peaks aligned with stressed syllables, 
ie. the one schematically represented by c) in Table 10 from section 5.1.3.1, and 
exemplified by the behaviour of speaker ZE55 in the double topic-utterances. I will 
refer to it as “ZE55’s phrase accentuation" as a shorthand. 


(91) Association and alignment constraint ranking for *main" variant and ZE55's 
*phrase accentuation" 
ALIGN (H, 0) >> (0), T >> HO TBU >> 0€ T >> ALIGN (L, Rty) >> ALIGN (L, Lt;) 
>> ALIGN (H, Lty) >> NoAssoc >> NOTARGET >> ALIGN (H, Rt;) 


In this constraint ranking and with the tone sequences we are dealing with here, 
ALIGN (L, Rt;) and ALIGN (L, Lt;) are in direct competition and cause L to be as 
right-aligned as possible (i.e. always directly next to the next H), but also forming 
low stretches towards the left. Since multiple alignment is not constrained for L 
and o'€ T is ranked above NoAssoc, the L could in principle associate with avai- 
lable stressed TBUs it thus reaches (this is spreading??? according to Gussenhoven 
2000a, 2004: 153—155). This doesn't happen because there is one H for every stres- 
sed syllable and the high-ranked ALIGN (H, 0) together with NOCROWD ensures that 
an H associates with a stressed syllable, rather than an L. ALIGN (H, Rt4), which 
will become important in the phrase accentuation variants by NQ01 and XJ45, is 
here ranked below NOTARGET, rendering it ineffective. Because the edge-alignment 
constraints for L are both higher-ranked than any edge-aligning constraints for 
the H tone, they effectively prevent the H from multiple alignment and spreading 
which would result in plateau realizations (see candidate (f) in the tableau for 
ZE55's phrase accentuation (Table 18)). While low stretches due to L aligning mul- 
tiply occur very regularly, high plateaus extending between two accented syllables 


243 The terminology is slightly confusing, because what enables spreading is the simultaneous 
satisfaction of opposing alignment constraint, but technically a tonal phenomenon can only be 
called spreading when the tone associates with more than one TBU (Gussenhoven 2004: 217). Con- 
sequently, NOTARGET is separate from NoAssoc. In many of the contours observed in this work, 
multiple alignment of tones is active, but there is less evidence that this also involves spreading 
(the association of the tone with more than one TBU it thus covers). 
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were indeed far less frequently found (cf. section 5.1.1.2). They are therefore not 
the main focus here, but of some interest because the plateau-forming behaviour 
is very frequent in the Quechua data. Whether they are allowed or disallowed is 
decided by the relative ranking of ALIGN (L, Rt;) and ALIGN (H, Lt,), respectively 
(as long as they're both above NoTARGET). If ALIGN (L, Rts) is ranked above ALIGN 
(H, Lt), then it will be the L that forms a low stretch, whereas if they swap places, 
high plateaus will be formed. This is only possible when there is only a single 
minimal tone sequence in the phonological phrase, because with several, as in the 
*main" variant, the faithfulness and association constraints would prevent a single 
H from aligning multiply across several stressed syllables at the expense of other 
tones. As an interesting side effect, the plateau-like realization (as found in section 
5.1.1.2) thus turns out to be connected with the phrase accentuation. This seems 
intuitively plausible since they both are a step closer to the prosodic configuration 
most prevalent in the Quechua data than the *main" variant. However, since the 
plateau realization is quite rare in the Spanish data, I will continue the further 
analysis with the assumption that ALIGN (L, Rt;) is ranked above ALIGN (H, Lt;), 
disallowing its formation. The full ranking is then as follows: 


(92) Full constraint ranking for *main" variant and ZE55's *phrase accentuation" 
LINEARITY >> MAXIO(T) >> NOCROWD >> ALIGN (H, 0) >> (0), € T >> H > TBU 
>> o'€ T >> ALIGN (L, Rt;) >> ALIGN (L, Lt;) >> ALIGN (H, Lt;) >> NoAssoc 
>> NOTARGET >> ALIGN (H, Rt;) >> L 2 TBU >> o€ T 


The constraints not given in bold in (92), i.e. the high-ranking first three constraints 
(LINEARITY, MAXIO(T), and NoCROWD) and the two low-ranking last ones (o € T 
and L > TBU) will not normally be given in the tableaus below, in order to make 
them less crowded. The tableaus show that this ranking produces the expected 
results both when there are as many LH sequences as there are prosodic words, 
in the *main" variant (Table 17) and when there is only one LH sequence per pho- 
nological phrase in the *phrase accentuation" variant by speaker ZE55 (Table 18). 
In Table 17, six candidates are given for the *main" variant. The first three (a-c) 
all satisfy the four highest-ranking constraints ALIGN (H, 0), (0),€ T, H > TBU and 
o'€ T. They differ in the way they satisfy ALIGN (L, Lt) and ALIGN (L, Rt): candidate 
a) satisfies ALIGN (L, Rt;) at the expense of ALIGN (L, Lt,), producing a low target 
only on the syllable before the final stressed syllable, candidate b) the other way 
around, with only one low target directly after the first stressed syllable. Candidate 
C), the optimal candidate, satisfies both as much as possible: two low targets are 
produced, directly after the first and directly before the final stressed syllable, with 
a low stretch between them: this is indeed what we observe in the *main" variant, 
with pitch normally falling rapidly after the peak on a stressed syllable and staying 
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Table 17: OT-tableau with the constraint rankings to arrive at the correct association and alignment 
behaviour in the phonological phrase for the “main” variant of Spanish. Brackets are around prosodic 
words, accents mark the strongest element within its domain (syllable within prosodic word and 
prosodic word within phonological phrase); lines signify association between a tone and a syllable, 
dashed arrows indicate multiple alignment of a tone. Black dots are tonal targets. 
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low untilright before the next one. The last three candidates (d-f) all fail to be selec- 
ted because they violate ALIGN (H, 0) and H > TBU, thus preventing a realization 
with the middle L and H as floating central tones (d), H as right-aligned boundary 
tone (e) or one that could be annotated as L*H (f). 


5.3.2.2 Phrase accentuation Spanish variant 1 (exemplified by speaker ZE55) 

For the *phrase accentuation" variant produced by ZE55, corresponding to c) in 
Table 10, the same constraint ranking holds, as shown in Table 18. Candidate a) is 
the optimal one, with the H associating with the strongest syllable and the L again 
forming two elbow targets, one closely aligned before the H, and another on the 
other stressed syllable, with which it associates. This makes the assumption that an 
association constraint prefers an alignment of at least one edge of a tone with the 
corresponding edge of the TBU it associates with. Without this assumption, candi- 
date b) would actually be preferred, because ALIGN (L, Lt) would incur one viola- 
tion mark less. In fact, neither for the realizations by ZE55 nor the phrase accentua- 
tion utterances discussed in section 5.1.3.1 is the contour represented by candidate 
8) the only one attested, the one represented by candidate b) also seems to occur. 
This is difficult to assess with certainty also because L tone targets are notoriously 
difficult to identify and because it then becomes partially a matter of theoretical 
preference whether the L should be seen as associated or not. It seems plausible 
that the L tone associates variably, due to o'€ T having a ranking distribution that 
here significantly overlaps with that of ALIGN (L, Lt,) (cf. Boersma & Hayes 2001). 
The tableau thus displays the ranking which is probably selected more frequent- 
ly.2“* Candidate (f), as mentioned above, displays the plateau realization which is 
here prevented by the ranking of ALIGN (L, Rty) over ALIGN (H, Lt,). If their ranking 
were reversed, candidate (f) would win, generating the realization observed occa- 
sionally in section 5.1.1.2 and also likely on the second topics in the double topic-ut- 
terances by ZE55, NQ01, and XJ45. 


244 Thisis modeled in Boersma & Hayes (2001: 47—49) by assuming that the position of a constraint 
on a continuous scale has a random perturbance range so that at the moment of production, a 
selection point on the scale for the constraint is generated that is different from its position. If two 
constraints C1 and C2 are closely adjacent on the scale, this leads to a majority of outcomes exhibi- 
ting C1 >> C2 Gf C1 is ranked higher), while a minority exhibit C2 >> C1, with their frequency ratio 
determined by their proximity on the scale. 
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Table 18: OT-tableau with the constraint rankings to arrive at the correct association and alignment 
behaviour in the phonological phrase for ZE55's phrase accentuation variant of Spanish. Brackets 
are around prosodic words, accents mark the strongest element within its domain (syllable within 
prosodic word and prosodic word within phonological phrase); lines signify association between a 
tone and a syllable, dashed arrows indicate multiple alignment. Black dots are tonal targets. 
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5.3.2.3 Phrase accentuation Spanish variant 2 (exemplified by speakers XJ45 
and NQ01) 

As a next step, we proceed to the next “phrase accentuation” variant that XJ45 dis- 
plays in his double topic-utterances, and which in non-final phrases corresponds 
to d) in Table 10 from section 5.1.3.1. In order to generate the correct output for 
the phrase accentuation variant by XJ45 and NQ01, a crucial reranking has to be 
undertaken. This variant regularly has final rises at the end of the phonological 
phrase instead of peaks on the metrically strongest syllable. This central difference 
is achieved by moving ALIGN (H, Rt;) from its position below NoAssoc and NOTAR- 
GET to the top of the ranking as shown, i.e. above (0), € T (but still below LINEARITY, 
MAXIO(T), NoCROWD not shown in the tableaus), and by moving ALIGN (H, 0) to a 
position further below. This results in the ranking (93). 


(93) Association and alignment constraint ranking for XJ45’s “phrase accentuation” 
variant 
ALIGN (H, Rte) >> (0), € T >> H > TBU >> o'€ T >> ALIGN (H, 0) >> ALIGN (L, 
Rtg) >> ALIGN (L, Lty) >> ALIGN (H, Lty) >> NoAssoc >> NOTARGET 


Table 19 gives a tableau with this ranking??? showing that this will indeed pick out 
the correct candidate: candidates a) and b), which were very good in ZE55's variant, 
are now out because they violate the high-ranking ALIGN (H, Rt;). Candidate c) does 
not do that, but since it does not associate any of the tones, it is discarded because 
of violation of the still high-ranking (0),€ T and o'€ T. Candidate d), on the other 
hand is optimal: it aligns the H tone at the right edge of the phonological phrase, 
allowing it to associate with the available TBU there and leaving the preceding L 
tone to associate with the strongest syllable and spread leftwards, also associating 
with the other stressed syllable and forming a second target there. Candidate d) is 
preferred over candidate e) again because of the assumption we made above about 
associations preferring tonal targets; however, also for XJ45 deciding between the 
contours of d) and e) is not always straightforward from the data. 

We can now model the final step attested step for Spanish, namely the kind 
of “phrase accentuation" that NQ01 produces in her double topic-utterances and 
elsewhere. Like XJ45, she does not produce peaks on strong syllables in non-final 
phonological phrases, but instead final rises. Apart from the behaviour in IP-final 
phonological phrases, which will be treated in the next section, there is only one 
noticeable difference between XJ45 and NQO1 in the ranking of the constraints we 
have looked at so far: NQ01 never seems to align elbows with any of the pre-final 


245 NOTARGET is not displayed in the tableaus from here on. 
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Table 19: OT-tableau with the constraint rankings to arrive at the correct association and alignment 
behaviour in the phonological phrase for speaker XJ45's phrase accentuation variant of Spanish. 
Brackets are around prosodic words, accents mark the strongest element within its domain 

(syllable within prosodic word and prosodic word within phonological phrase); lines signify 
association between a tone and a syllable, dashed arrows indicate multiple alignment. Black dots are 
tonal targets. 
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stressed syllables. Instead, the low stretch in her phonological phrases spreads uni- 
formly from their left edges to directly before the rise on the final syllable. I there- 
fore assume that candidate e) in Table 19, which is dispreferred under the assump- 
tion of association positions preferring targets, is actually the normal outcome 
candidate for her. In order to arrive at a ranking that will uniformly produce that 
output while still upholding that assumption, o'€ T is moved further down the 
ranking, below NoAssoc. This positions this ranking even closer to Quechua, where 
6€ T plays no role in any variant (see section 6.3). 


(94) Association and alignment constraint ranking for NQO1’s “phrase accen- 
tuation" variant 
ALIGN (H, Rtg) >> (0),«€ T >> H > TBU >> ALIGN (H, 0) >> ALIGN (L, Rt) >> 
ALIGN (L, Ltg) >> ALIGN (H, Ltg) >> NoAssoc >> NOTARGET >> 0€ T 


With this ranking, the selection of candidate e) from Table 19 is assured, whether the 
assumption holds or not. Note that contours generated by this constraint ranking 
would be very hard to differentiate from ones in which association constraints 
play no role at all: the position of the tones is entirely determined by alignment 
constraints. From a Spanish perspective, it makes sense to maintain the notion of 
association here, since we have good reasons to assume that H tones associate with 
stressed TBUs in *main" variant Spanish. The theoretical consequence is therefore 
to retain the assumption of association in the *phrase accentuation" variants, espe- 
cially if we want to trace the difference between the variants in terms of minimally 
necessary steps. However at this stage, the matter becomes less easy to determine 
empirically, since differences would only emerge in quantified positional measure- 
ments of the turning points and effects on other acoustic correlates of the kind 
done in section 6.1.6 on Quechua rise-falls and Spanish main variant paroxytones, 
and the results, as they are there, would still be open to some interpretation. Here 
I will assume that the above ranking with association is valid for at least some of 
the Spanish utterances. In section 7.4 we will then consider what happens when we 
remove the association constraints. 


5.3.3 IP-final behaviour 


Things become slightly more complex once IP-boundary tones get thrown into the 
mix. The limits of what the discrete OT-model can faithfully represent will become 
evident. However, we will also see that it allows us to generalize between cases that 
might seem disparate at first. First of all, there is the case where the IP-boundary 
tone is realized at the right edge of the phrase while the H tone of the phonological 
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phrase forms a separate peak on the preceding stressed syllable (independent of 
whether the IP tone is an L or a H itself): we can assume that this is achieved by a 
constraint ALIGN (T, Rt), which aligns a tone provided by the intonational phrase 
with the right edge of that phrase (which is also the right edge of the rightmost 
phonological phrase). If this constraint is ranked above the other association and 
alignment constraints, then in cases of IP-final paroxytones, this will lead to the 
observed outcomes, not only for the main variant but also for the phrase accen- 
tuation variants: all the other constraints so far discussed staying as they are, this 
will result in the T, aligning at the right edge of the phrase. Since the H tone of the 
phonological phrase cannot now align with the last TBU anymore (the edge of the 
IP and that of the rightmost phonological phrase being the same), it will move one 
syllable to the left onto the strongest metrical syllable with which it associates, and 
the preceding L is also *pushed" further back, possibly associating with one of the 
available other stressed syllables and satisfying its alignment constraints as well 
as possible. For ZE55 and the “main” variant, since ALIGN (H, Rtg) is not ranked 
high anyway, no competition takes place between the H tone of the phonological 
phrase and the boundary tone of the IP (as long as the final word is paroxytonic, see 
below), but the result is the same for the more edge-prominent variants of *phrase 
accentuation” (cf. e.g. XJA5's ELQUD ES 16 (Figure 63) and 19B (Figure 64) in section 
5.1.3.1.3). This shows that in the phrase accentuation variant creating final rises in 
non-final PhPs, the H tone of the PhP is not really fixed at either the phrase edge or 
the strongest syllable: when no other tone intervenes it will align rightmost, but if 
it cannot, it will still align and associate with the strongest position. This provides 
support for the idea that tones are really autonomous on their tier and not predes- 
tined for any kind of association. It also demonstrates that pitch contours appearing 
to be the same or very similar can be generated from various rankings. The same 
contour also occurs in the Quechua data, where it is again generated from different 
rankings. ALIGN (T, Rt) also has a counterpart, ALIGN (T, Lt). This is ranked below 
the other constraints, but its effect emerges in cases of deaccentuation (see below). 
The effect that the competition between the two tones for the rightmost position 
has is necessarily enabled by NoCrowp (although it doesn't capture the full phe- 
nomenon, see below). The ranking is given in (95). 


(95) Constraint ranking including IP tones 
NOCROWD >> ALIGN (T, Rt) >> Association and alignment ranking for main / 
phrase accentuation >> ALIGN (T, Lt) 


The same constraint ranking also generates the attested outcomes in utterances 
with final oxytones and with deaccentuation. In utterances with IP-final oxytonic 
or monosyllabic words, the same competition between the IP boundary tone L, and 
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the last H tone of the phonological phrase takes place also for the main variant that 
we just discussed for IP-final paroxytones in the phrase accentuation variant. An 
example in an utterance that is in the main variant is (38) / Figure 21 from section 
5.1.1; an example from an utterance with phrase accentuation by NQ01 is given in 
Figure 111. 


FU gt 


Figure 111: NQ01 ELQUD ES 607^ (/a mamá le da de comer a su bebé ‘the mother gives her 
baby to eat"). 


What we observe in both cases is that the IP-final stressed syllable realizes a pitch 
peak, followed by a fall. This suggests that both tones are realized on this last syl- 
lable, with the right-alignment of the L, winning out against that of the H. This is 
exactly what the constraint ranking would predict for oxytonic or monosyllabic 
final words: ALIGN (H, o) and ALIGN (H, Rt) now seek alignment for the H to the 
same TBU and are both equally outstripped by ALIGN (T, Rt). But H will still align as 
far to the right as possible. Both (0),,€ T and H > TBU effect an association of the H 
with the stressed syllable, while L, remains an unassociated boundary tone, there 
being no constraint forcing its association. The two utterances exemplify however 
that this account leaves out relevant aspects of what is observable here. In Figure 
21, the H tone clearly wins out over the L, in terms of scaling and (impressionisti- 
cally) perceptual salience, while it is the other way around in Figure 111. The tones 
are clearly in competition under adverse temporal conditions. 


246 https://osf.io/d46st/ 


294 —— 5 Huari Spanish 


N 
Á " i Í 
10042 n —Ó - 
| a - "nae à comida a de un tarin 
| jer Lin mi i ^- 


Figure 112: XJA5 ELQUD ES 60?" (la mujer le está dando comida al niño de un tazón ‘the woman is 
giving food to the child from a mug’). 


XJ45’S ELQUD ES 60 (Figure 112) is an even more radical example for the H tone 
winning out over the L, which here arguably does not get realized at all. For utteran- 
ces like this one (recall also NQ01's ELQUD ES 65/Figure 104), we could assume that 
the relative ranking between ALIGN (H, Rt;) and ALIGN (T, Rt) is actually reversed, 
with the former now outranking the latter, so that the L, aligns to the left of the 
H, with its realization then becoming difficult to detect in the low stretch that is 
realized anyway before the peak of the H. This is parallel to Gussenhoven (2000a)'s 
proposal for Roermond Dutch (cf. (70)). The nonperipheral realization of boundary 
tones is perhaps not so strange now that we have witnessed the malleable roles 
tones can assume throughout the discussion. To my mind, a much graver problem 
is that modelling this via a discrete reranking of the two constraints obscures the 
fact that many ofthe phenomena involved are essentially gradual. Two ofthe resul- 
ting processes when several tones compete for the same temporal window have 
been called truncation and compression in the literature, truncation signifying a 
pitch movement that is incompletely realized, and compression one that is com- 
pletely realized but with reduced temporal span. Arguably in Figure 21 the H is 
compressed and the L, slightly truncated; in Figure 111, it is the H whose scaling 
is considerably truncated, and in Figure 112 the L, is fully truncated. The two pro- 
cesses have often been seen as mutually exclusive strategies that languages adopt 
wholesale, i.e. as a typological dichotomy that separates languages into 'trunca- 
ting’ and ‘compressing’ ones. This view is espoused by Ladd (2008: 182), although 
it is already noted there that “the distinctions are by no means clear-cut”. Indeed, 
Prieto & Ortega-Llebaria (2009)’s results indicate that individual differences seem 
to play a substantial role: one of their four peninsular Spanish speakers shows a 
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much greater preference for truncating the LH*L% contour in IP-final oxytones 
under narrow focus than the other three (who prefer compensatory lengthening 
of the final syllable). Two further studies shed a more discriminating light on the 
phenomena. Cho & Flemming (2015) show for Seoul Korean that on the one hand, 
compression can gradually lead to truncation: with increasing time pressure (faster 
speech rate), the second L tone of the LHLH tune in the accentual phrase (AP) will 
first be increasingly undershot and then not realized at all even in four-syllable 
APs. On the other hand, they also attest that there is a form of categorical trunca- 
tion: in two- or three-syllable APs, the same second L tone will never be realized, no 
matter how slow the speech rate. Although both phenomena target the same tone, 
they assert that only the latter, but not the former, constitutes a case of phonological 
tone deletion (Cho & Flemming 2015: 379). Rathcke (2016) compares German and 
Russian in situations where pitch accents HL*, H*, and L*H on IP-final paroxytones 
and oxytones with a variably voiced or unvoiced coda in the stressed CVC syllable 
were followed by an L% boundary tone. Her results demonstrate that the two lan- 
guages, previously both classified simply as ‘truncating’, employ both compressing 
and truncating strategies as well as temporal re-alignment of the tones, but do so 
differently from each other, and to different degrees (up to and including catego- 
riality) in relation to the experimental conditions (cf. Rathcke 2016: 221—223). She 
also shows that in both German and Russian the phonological status of a tone is 
relevant for which strategy is employed: in the H* L% and L*H L96 tunes under 
the severest time pressure conditions, the final fall was extremely undershot, trun- 
cating the target of the L boundary tone, while in the HL* L% tune, it was mostly 
preserved or only slightly undershot, leading to the conclusion that L tones that 
are part of pitch accents are preferentially preserved over those that are boundary 
tones (Rathcke 2016: 223). Both Cho & Flemming (2015) and Rathcke (2016) paint a 
very nuanced picture of truncation and compression, and concur on the point that 
these processes do not affect all tones equally: if a tone is affected in Seoul Korean, 
it is the AP's second L tone, and in German and Russian it is the L tone that signals 
the IP boundary. Recall that the most likely analysis proposed for neutral polar 
questions in Huari Spanish (section 5.1.2.1) in contrast requires that the realiza- 
tion of boundary tones wins out over that of the final pitch accent under crowding 
conditions, so that the formal differentiation of that utterance type is preserved. 
This kind of targeting behaviour in principle squares well with a constraint-based 
approach, in which the faithful preservation ofthe different tones is ranked against 
each other, so that under adverse conditions, one of them will be preferentially 
reduced or eliminated. 

Yet the preferential and gradual nature is not really captured in our constraints 
here, demonstrating the limitations of this essentially discrete modelling. Equally, 
the existing competition between the two tones is only badly captured with a cons- 


296 —— 5 Huari Spanish 


traint like NOCROWD, which only forbids the association of two tones with the same 
TBU (cf. Gussenhoven 2004: 149), but does not say anything about a case like this 
one, where the tones are in competition even though probably only one of them 
(the H) seeks to associate, and yet can be seen to lose out in some cases in terms 
of relative contrast. This matter will not be resolved here, but it should be unders- 
tood that realistic versions of the constraints involved here probably would need 
to refer to gradual differences in articulatory and perceptual contrastiveness of 
pitch movements under differing temporal conditions. Yet at the same time there 
is also an element of categoriality involved, since adverse temporal conditions are 
not simply those with increased speech rate: in Cho & Flemming (2015), increase in 
speech rate does increases the undershoot for the second L tone, but it is the more 
phonologically-mediated increase in time pressure in the form of having to realize 
four tones in less than four syllables that categorically deletes it. 

Keeping this in mind, the same constraint ranking (with ALIGN (T, Rt) domi- 
nating ALIGN (H, Rt4)) can also be maintained for deaccentuation. We saw that 
deaccentuation occurs occasionally in the Huari Spanish data, and even though its 
occurrence is likely somewhat subject to preference, it seems most likely to occur 
on postnuclear material, i.e. when the strongest metrical position is prefinal in a 
structure. An attempt at a definition is given in (96), based on a similar formulation 
in Féry (2017: 155): 


(96) Ifanon-final constituent of a (minimal) IP receives highest prominence, then 
any constituents of the same level following it are not assigned any tones at 
the level of the phonological phrase (deaccentuation) 


Just as in the phrase accentuation, the realization of the postnuclear pitch accents 
in cases of deaccentuation varies gradually between reduction in scaling and dele- 
tion (realization as flat). In section 5.1.3.2, I noted that it seems more likely that a 
totally flat pitch contour is a variant realization for accents with reduced scaling 
than to assume that phonologically fully deleted accents should manifest a variant 
realization where tones are only somewhat reduced in scaling. To capture this in a 
general account, the competition between the constraints aligning the L tones left- 
wards here and faithfulness constraints preserving the pitch accents would have 
to be allowed to have gradually variable outcomes.”“ The constraints here are not 


248 Note however that such a gradual competition could very well accommodate the notion resul- 
ting from the tone suppression constraints in section 5.3.1, that there are tones in principle avai- 
lable even for each syllable. Effectively this is a conception in which the different levels of the pro- 
sodic hierarchy are in constant competition with each other about the creation of articulatory and 
perceptual contrasts, with in principle gradual outcomes. 
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fully equipped to deal with this gradual variation in scaling. However, assuming 
that (96) captures (an aspect of) the phenomenon truthfully at some (abstract) 
level, this allows for a generalized treatment of deaccented and non-deaccented 
utterances in the following fashion:??? in the postnuclear stretch, there won't be 
any phonological phrase tones available. However, since this is still part of an IP, 
there will be an IP tone available (in most cases an L), aligning to the right. Earlier 
we said that its counterpart, ALIGN (T, Lt,), needs to also be included in the ranking, 
although below all the other constraints considered. Because all other constraints 
ranking above it that block its effect refer to phonological phrase tones, it is now 
free to act here and its presence then accounts for the observable formation of a 
low elbow in these cases directly following the nuclear syllable on which the last 
H tone is realized, often causing quite abrupt falls (but cf. Barnes et al. (2010) for 
an alternative account). And there is in fact evidence that this constraint is also 
active in utterances without deaccentuation, coming from utterances with IP-final 
proparoxytones, such as Figure 23 in section 5.1.1 or Figure 113, where the pitch on 
the last word forms a peak on the stressed syllable and then drops abruptly at the 
beginning of the penult as in the cases with deaccentuation, forming a low stretch 
from there to the end of the phrase. 


| l M 


rates 


Figure 113: ZE55 ELQUD ES 51??? (no fue el último ‘no it was the last one’). 


249 The way it is formulated, (96) also applies equally well to cases of both deaccentuation and de- 
phrasing, unifiying them somewhat (cf. section 3.4.6). It also preserves the notion that the strongest 
(nuclear) accent in any phrase from the minimal IP-level upwards is always rightmost (that is what 
deaccentuation makes sure of), but it assumes a recursive prosodic structure in order to do so. 
250 https://osf.io/uxenr/ 


298 —— 5 Huari Spanish 


Summing up, this section has used OT to demonstrate that the observed intona- 
tional variants of Huari Spanish can be analyzed as differing prominently along 
two dimensions, that of tone distribution per domain and that of tonal alignment 
and association. It was shown that via a systematic reranking of a limited number 
of constraints, Huari Spanish utterances move from one end of the spectrum, in 
which prosodic words are optimized by realizing an LH* pitch accent on each stres- 
sed syllable, to the other, in which phrases comprising several words are optimized 
via edge-seeking tones; with several attested positions in between also falling out 
from this stepwise reranking process. The analysis did not extend to two additional 
(and related) dimensions, tonal scaling and crowding, even though it was found 
that they are very relevant to the intonational characterization of Huari Spanish. 
Yet their continuous nature does not lend itself to an essentially categorical (even 
if stepwise gradable) analysis as the one employed. This is in itself a noteworthy 
result. Still, the analysis has provided the Spanish half to answering the third group 
of research questions laid out in section 4, especially (36)b. In the next chapter (6), 
we will explore the intonation of Huari Quechua, including with an OT-analysis 
(section 6.3) that provides the other half of the answer to that question. More com- 
prehensive conclusions will then follow in the final chapter (7). 


6 Huari Quechua 


This chapter describes the intonation of Huari Quechua. It is separated into three 
sections. The first section describes the intonational contours observed, and argues 
that variation exists in the tonal alignment patterns underlying them, of which 
only a subset makes reference to a regularly determined word stress on the penult, 
while others only refer to phonological phrase and prosodic word boundaries. 
Section 6.2 explores the distribution of these alignment variants and relates them 
quantitatively to aspects of meaning and information structure. Section 6.3 relates 
the alignment variants with each other and the variation found in the Spanish data 
via an OT-analysis. Section 6.4 qualitatively explores further aspects of how infor- 
mation structure is cued in interaction between prosody and morphosyntax. 


6.1 Description of Huari Quechua tonal contours 


In section 3.3.3, it emerged from the discussion on the literature about prosody 
in Ancash Quechua that it is doubtful whether the existing accounts all describe 
the same phenomenon of word stress, and that the pitch behaviour described is 
perhaps more indicative of the behaviour of phrasal tones seeking alignment with 
prosodic edges. In this section I will provide an analysis of the intonational pat- 
terns observed in the Huari Quechua data. I will bring evidence to bear against the 
assumption that tonal events in this Quechua variety are homogeneously affected 
by a stress position determined at the word level. Instead, it will be argued that 
Quechua intonation is based on phrasal tones assigned at the level of the Phono- 
logical Phrase (corresponding to the Accentual Phrase in the descriptions of some 
languages). Three phrasal contours will be identified: a rising, an only-falling, and 
a rise-falling one (section 6.1.1). Some of the tones forming these contours always 
seek alignment with the phrase boundary (sections 6.1.3, 6.1.5). The variation in 
alignment of the others follows two main patterns: alignment with the word bound- 
ary (section 6.1.2), and with the word penult (section 6.1.4). There is also a marked 
pattern only observed on loanwords of Spanish origin (section 6.1.8). Even though 
the word penult is determined to be regularly metrically prominent (as argued in 
many existing accounts), crucially, its influence on prosody is found to be small 
overall and varying depending on the alignment pattern: in the word-boundary 
variant, it has no influence, while in the word-penult variant it serves as a land- 
mark for tonal alignment but nothing else. This is shown to result in quantifiable 
alignment differences between phenotypically similar contours of Huari Quechua 
and Spanish (section 6.1.6). The results in this section overall suggest that Huari 


[o] Open Access. © 2024 the author(s), published by De Gruyter. [CEAS This work is licensed under the 
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783111304595-006 
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Quechua as represented by this data ranks very low on a typology of how much of 
the prosodic phonology is affected by stress position, as proposed by Hyman (2014). 
This is further corroborated by the marked (and overall infrequent) behaviour of 
loanwords from Spanish: the stress position they are sensitive to is treated quite 
differently by the prosody. I suggest that the observed variability is at least partially 
responsible for the heterogeneous descriptions found in the existing literature. 

The qualitative analysis in this section will treat declaratives, but section 6.2.1 
demonstrates that utterance type is not an overly relevant factor influencing contour 
choice in Huari Quechua, similar to findings on other varieties (cf. Cole 1982; 0?Ro- 
urke 2005). This is also confirmed in section 6.1.6, where peak alignment in the same 
contour shows no evidence of differing between utterance types. 


6.1.1 Introduction to the rising and falling contours 


Most generally speaking, there are two different kinds of phrase-level tonal con- 
tours: falling and rising ones. Falling contours minimally include a fall from high, 
possibly followed by a low stretch, as (only-)falling contours. Optionally, the fall 
is also preceded by a rise and/or a high plateau, in rising-falling contours. Rising 
contours minimally consists of a rise, optionally preceded by a low stretch, and 
optionally followed by a suspended high plateau and/or a further final rise. 


(97) AZ23 Conc Q 0960 
segunda fila-chaw segundu-chaw kiru 
second  row-LOC  second-LOC tooth 
“in the second row, the second [one is the] tooth"?! 
Both a rising and a rising-falling contour are produced in AZ23 Conc Q 0960?9? 
((97)/Figure 114). The utterance consists of two phrases, segunda filachaw and 


251 This is an example from Conc (cf. section 2.4 for a description). Thus it should be understood 
as “in the second row (segunda filachaw), the card in the second column (segunduchaw) is the one 
with the tooth (kiru)". 

252 Figures with Quechua examples include five tiers in the textgrid: a tier with an IPA transcrip- 
tion aligned with syllable boundaries (1), a tier separating morphological words in an orthographic 
transcription and aligned with the word boundaries (2), a tier separating the words into morphemes 
(3) and another with the English glosses (4), and one giving an approximate English translation (5). 
The syllabification in tier (1) is based on the Quechua phonotactics described in Parker (1976), ac- 
cording to which both onsets and codas cannot be complex and onsets are preferred over codas 
(VCVCV will be syllabified as V.CV.CV and not VCVCV). The IPA transcription is phonetic and aims 
to be as faithful as possible, using both the auditive percept and the spectral information provided 
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eginda filachaw segunduchaw kiru 


soon row LOC second- LOC tout 


in the sccood tow vx he second onc. the tooth 


Figure 114: A723 Conc Q 0960?? (declarative with a rising and a falling contour). 


segunduchaw kiru, the first one with a rising contour and the second with a falling 
one. In the first one, segunda is basically flat, then pitch rises on the penultimate 


by praat as basis. Occasionally, this results in the transcription of additional vowels or consonants 
(such as epenthetic vowels breaking up consonant clusters, glides between vowel segments, or 
nasal segments before voiced onsets) that are not represented in the morphological word form. The 
transcription also shows that many consonants are often subject to processes of lenition or place as- 
similation, and occasionally to dissimilations; and that especially word- and phrase-final vowels are 
often realized as centralized, with a breathy quality, or partially or completely unvoiced, with a con- 
siderable variability of realization in general. The extent of this varies between speakers. In cases 
of extreme vowel reduction I have tended to preserve Quechua phonotactics as described above, 
e.g. transcribing a realization of /hawanchaw/ with a severely reduced final vowel as [ha.wan. 
tf], while a transcription approach ignoring any preconceived notion about supposed underlying 
forms might have chosen [ha.want]]. When there was absolutely no trace of a vowel segment, I also 
sometimes chose a transcription violating the phonotactics. In any case, the spectrogramme and 
the interval boundaries in the figures allow readers to inspect the relative duration and amount of 
energy in the vowel segments for themselves. I have not used the length diacritic [:] at all, because 
it seems there is no principled way of using it here: phonologically long vowels (based on existing 
descriptions) are not always consistently produced with longer duration than their short counter- 
parts, even in the same utterance. The length of the syllable interval in the figure, compared with 
the morphological form given in tiers (2) and (3), where long vowels are represented by a repeti- 
tion of the vowel symbol, provides information about actual temporal realization. I do not think 
phonologically contrastive vowel length is nonexistent here, just that there is no principled way of 
using the length diacritic in a phonetic transcription of these data. Parker (1976) describes process- 
es responsible for the shortening of (morphologically) long segments, e.g. one where a syllable that 
is morphologically CV:C is produced as CVC, because maximal syllables are either CV: or CVC. In 
the data here, vowels in these positions seem to show the same durational variability as all others. 
253 https://osf.io/acp7y/ 
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syllable of the phrase (la of filachaw) and stays high on the final syllable. In the 
second one, pitch is flat on the first two syllables, then rises on the penult of the first 
word segunduchaw and stays high until the first syllable/penult of kiru, then falls 
steeply, forming a low stretch on the final syllable. Which contour is realized does 
not depend on the lexical items involved,?** as (98)/Figure 115 shows. 


(98) LC34 MT Q 1037 and XQ33 MT Q 1053 
hirka hawa-n-pa 
hill below-3-GEN 
“below the hill” 


hill bekow-3-GEN hill bekow-3-GEN 


the hill below tbe hill 


Figure 115: LC34_MT_Q_ 1037? (declarative realized with a falling contour) and XQ33 MT. Q 1053 
(confirmation-seeking polar question realized with a rising contour). 


Here, two utterances that are lexically and syntactically identical but pragmatically 
different are realized by two speakers in direct succession, first with a falling and 
then with a rising contour.?*6 The same word hawanpa realizes a fall on its penult in 
the first case and a rise in the second, while no tonal event occurs on hirka, which 
is overall high in the first, and overall low in the second utterance. Hirka does not 


254 Except to the extent that additonal alignment patterns are optionally available for loanwords 
from Spanish, for which see section 6.1.8. In the preceding sections, any examples containing Span- 
ish loanwords show intonational behaviour also available to *native" Quechua words, which is 
their majority behaviour (cf. 6.1.8.3). 

255 https://osf.io/mfbse/ 

256 In a very similar context as the one described for Huari Spanish confirmation-seeking ques- 
tions, cf. 5.1.2.2. 


6.1 Description of Huari Quechua tonal contours — 303 


belong to a different lexical class from hawa, e.g. of unaccented vs. accented words, 
respectively, as might be suggested from the perspective of languages like Japanese 
or Basque (cf. section 6.1.8.2). This is demonstrated by (99)/Figure 116, where the 
same lexical item is realized with a rising-falling contour. 


(99) XQ33 MT Q 1213 
hatun ka-q hirka-pa 
big COP-AG hill-GEN 
“the hill that is big" 


hatun kag hirkapa 


big COP-AG ball GEN 


^ 
hatun ka-q hirkapa | 
+ 
| 
I 


the big halt? 


Figure 116: XQ33 MT. Q 12137" (confirmation-seeking polar question with falling contour). 


No evidence was found for a lexical distinction affecting the type of pitch event that 
can be realized, apart from when words of Spanish origin are involved, for which 
see section 6.1.8. Nor does a certain class of suffix such as e.g. the genitive -pa here 
effect the presence or absence of pitch accenting on a word (as is the case in some 
varieties of Basque, cf. e.g. Elordieta 1998; Hualde 1999; Hualde et al. 2002). That 
can be seen from (100)/Figure 117, where qiru-pa is just as lacking in pitch events 
as the two instances of hirka above, while waqta-n-pa does realize a pitch event. 


(100) LC34 MT Q 0095 
qiru-pa waqta-n-pa 
wood-GEN behind-3-GEN 
“behind the wood” 


257 https://osf.io/rquxg/ 
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lisp 
iM 
i) Wi 
ey 
MM) 


i 


P 
quu magtanpa 

quu pu waqu-o-pa | 

wood GEN back GEN | 


beàmd the wood | 


Figure 117: LC34 MT Q 0095?* (part of a declarative with falling contour). 


That there are no categorical lexical restrictions on which type of word the low 
or high stretch or the tonal transition can be realized is also valid for the differ- 
ence between lexical and function words. In XQ33's Conc Q 0690 ((101)/Figure 118), 
a rising contour is realized on two nouns, with the rise taking place only on the 
penult of the second, the spatial noun hawanchaw “in its below?*?", and the first 
noun realized low without any pitch event. 


(101) XQ33 Conc Q 0690 
qillay hawa-n-chaw krus 
money below-3-LOC cross 
“below the money [is] the cross” 


258 https://osf.io/y97cx/ 

259 As can be seen from the gloss in (101), this is analyzed as hawa-n-chaw “below-3-LOC”, with the 
3"! person suffix referring to the preceding noun qillay “money”, thus “below the money". I label 
as *spatial nouns" a group Quechua words that behave syntactically just like any noun, but refer 
to spatial relations (cf. Parker (1976: 87-88)). To give an example, in a possessive construction like 
qillay hawa-n-chaw, marking the possessor additionally (glossed as genitive) is also possible, yield- 
ing qillay-pa hawa-n-chaw *money-GEN below-3-LOC", with essentially the same meaning. The 
marking can also be reversed, i.e. hawa-pa qillay-nin-chaw “below-GEN money-3-LOC" (with the 
3" person marker in the form -nin because of the phonotactic restriction to simple codas), so that 
the meaning becomes *in the money from below" (e.g. as opposed to money from above). The fact 
that these spatial nouns regularly take person markers in Quechua is adduced by several authors 
to account for the frequency of parallel constructions, such as en su debajo del dinero, (instead of 
debajo del dinero), which are also attested in abundance in our data, in the Spanish varieties of 
Quechua-speaking regions (cf. Escobar 2000). 
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Figure 118: XQ33 Conc Q 069079 (declarative with a rising and a high flat contour). 


Exampleslike TP03 Conc Q 1372, QF16 Conc Q 1737,and XU31 MT Q 1855 ((102)/ 
Figure 119, (103)/ Figure 120, (104)/Figure 121, respectively) show in turn that 
even the function words tsay and kay, the distal and proximal demonstratives, can 
realize a rising or falling pitch event, at least if they function as pronouns, expand- 
eable by suffixes just like any other noun, instead of determiners. 


(102) TP03 Conc Q 1372 
kanan allawka ladu-yki-pa aja tsay-chaw-mi 
now right side-2-GEN yes DEM.DIST-LOC-ASS 
“now, to your right side, yes, there” 


(103) QF16 Conc Q 1737 
arash kay-chaw 
lizard DEM.PROX-LOC 
*[the] lizard, here" 


260 https://osf.io/xe759/ 
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wu 


ailawka laduykipa. m baychawrni 
allawka lado-vki-pa » tsay-chaw-em 
T 
mra right sde 2-GEN wo DEM DIST OC-ASS. 


[I xr your tight sade ves | there 


Figure 119: TP03 Conc Q 1372?*' (declaratives consisting of rising and falling contours). 


FO QU) 


semsh kary-chaw 


lard DEM PROX-LOC 


the lizard. here 


Figure 120: QF16 Conc. Q 1737"? (declarative consisting of two rising contours). 


(104) XU31 MT Q 1856 
tsay-pita ultu-ta rikaa-nki yaku-pita —yarqa-mu-sha-ta 


DEM.PROX-ABL tadpole-OBJ see-2 water-ABL leave-CIS-PRTCP-OBJ 


“then you see a tadpole coming out of the water" 


261 https://osf.io/g3f56/ 
262 https://osf.io/bruj9/ 
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DEM DIST-ABL Indpole-OHJ seo-2 water-ABL leave-CIS-PRICP-OBJ 


then vou see á tadpole coming out of the waler 


Figure 121: XU31 MT Q 1855” (declarative consisting of five rising contours). 


Examples like XQ33 Conc Q 0531 ((105)/Figure 122) HA30 MT Q 1241 ((106)/ 
Figure 123), QF16 MT Q 1348 (0/Figure 124), or HA30 Conc Q 0211 ((108)/Figure 
125) demonstrate that the low stretch following the fall in falling contours is not 
limited to only the final syllable of the phrase but can in fact encompass one or 
several full words (although such cases as HA30's MT Q 1241, where it seems to 
cover the last 3-4 words, are rare). The fall also does not have to take place in the 
last word of the phrase. 


(105) XQ33 Conc Q 0531 
quena ladu-n-chaw chugllu-pa = aqtsayku-n 
flute — side-3-LOC — corncob-GEN corn.hair-3 
“beside the flute is the hair of the corncob" 


(106) HA30 MT Q 1241 
tsay naani hawa-n-chaw-na-m huk runa 
DEM.DIST path  below-3-LOC-DISC-ASS one person 
ichira-ykaa-n  gillay-ni-n haya-ku-shaa 
stand-PROG-3 money-FON-3 carry-MID-PRTCP 
*below the path there now a man is standing holding money" 


263 https://osf.io/Btybp/ 
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Bl la sl 
pom" 


muu) 


[n 


bay naan hawa-o-chre-ns-œ bak nima ichira-kan-n qiilzy-ni-n lurya-ko-shaa 


DEM.DIST path below-3-LOC-DISG-ASS a persun be standing- PROG-3 moncy-FON-3 carm MID-PRTUP 


now below the path a mn tv standing. holding mocy 


Figure 123: HA30 MT Q 124179 (declarative with a rise-falling contour). 


(107) QF16 MT Q 1348 
atuq ka-q-man chaa-tsi-y 
fox | COP-AG-DEST arrive-CAUS-INF 
“let it get to the fox ^95" 


264 https://osf.io/ujxny/ 
265 https://osf.io/dvc76/ 
266 This utterance clearly has imperative force. Prosodically, this is not expressed (imperatives 
having no prosodic form distinct from declaratives). Morphologically, this is here (and regularly) 
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at aE 


NAM yam 


TRU 


| um 


fox COP-AG-DEST amive CAUS-INF 


ker ü get to the fox 


Figure 124: QF16 MT Q 13487? (imperative consisting of a falling contour). 


(108) HA30 Conc Q 0211 
washa kantu-chaw ukush-mi —ka-ykaa-n 
DEM.DIST edge-LOC mouse-ASS COP-PROG-3 
“at that edge there is a mouse” 


Equally, examples like (110)/Figure 127 (111)/Figure 128 (in the next section) show 
that the low stretch preceding the rise in contours that have it is not restricted to a 
single word. 

From examples like (98), where the first hirka hawampa is uttered by the 
speaker who has the path on the map in the maptask, and the second by the one 
without the path, it could be thought that the falling contour corresponds to declar- 
atives, and the rising to (polar) interrogatives. In fact, both types of contours occur 
on utterances where the context points to a declarative function as well as in those 
with an interrogative function. (99) already shows that a confirmation-seeking 
polar question, realized with a rising contour in (98), can also be realized with a 


indicated by the morpheme -y, which is glossed as -INF, in accordance with the glossing conven- 
tions of Bendezü Araujo et al. (2019). This is because it is formally identical to the “infinitive” suf- 
fix —y. It can’t be said whether these infinitive imperative constructions are therefore parallel to 
the use of infinitive constructions for directive functions in languages such as Spanish (hablar de 
otras cosas "(let's) speak of other things!") or German (jetzt mal die Klappe halten *shut your mouth 
now”), where there are additional other means of marking imperatives. In Quechua, -y is the only 
morpheme for positive imperatives; negative imperatives (vetatives) are formed with the special 
vetative particle ama sentence-initially together with the negation suffix -tsu attached to one of 
the words in the sentence. 

267 https://osf.io/wrjpn/ 
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DEM DIST odge-LOC mosse ASS COP-PROG-3 


Figure 125: HA30 Conc Q 021179 (declarative with two rising and one falling contours). 


falling contour. Section 6.2.1 will delve a little further into the relation between 
contour type and interrogativity. 

This section has shown that no lexical or morphological distinction has a cate- 
gorical influence on whether a pitch event takes place on a word or not and which 
form it takes, and that a single tonal contour often encompasses several words, 
regardless of its shape (rising, falling, or rising-falling). A further general obser- 
vation is that in many utterances, pitch contours consist of relatively flat and long 
stretches sometimes extending over more than a single word, whether pitch is high 
overall (plateau-like realization), or low. The process generating these pitch con- 
tours is clearly phrase-based, not word-based. The functional differentiation of the 
contours will be discussed in detail in section 6.2, where it will be shown that rising 
contours can broadly be characterized as continuing or incomplete and falling ones 
as closing or complete (6.2.2). It will also be argued there (6.2.3) that it is the prom- 
inence and position of the words in a phrase that determine the location of the low 
and high stretches in it. In the following sections, I will provide a description of 
the alignment patterns in these contours regarding possible locations for the rise 
and fall in multi- and single-word phrases, using examples from the tasks Conc, 
Maptask and Cuento. 


268 https://osf.io/m4ec2/ 
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6.1.2 Alignment of the tonal transition with the word boundary 


(109) KP04 Conc Q 1532 
tsay tikra-nqa-yki ka-q tsugllu 
DEM.DIST turn-NMLZ-2 COP-AG corn 
“the one you turned around [is] corn”? 


bay Vhranqaha kay tngllu 


tay tikra-nga-yki ka-q tsugila 


Figure 126: KP04 Conc Q 1532?" (declarative consisting of a rise-falling contour). 


In KP04's Conc Q 1532, given in (109) and Figure 126, there is a single large rise-fall- 
ing contour encompassing the whole utterance, consisting of an initial low stretch, 
arise, a high plateau, a fall, and a final low stretch. The initiallow stretch is realized 
on the determiner tsay. The largest part of the following increase in pitch height 
is already accomplished at the beginning of the vowel segment of the initial sylla- 
ble of the second word tikranqayki, with just the last bit of it realized as an actual 


269 Note that the verb of being ka- is not used in Quechua in the third person present (ka-n) when 
expressing a copular relation. Instead, no verb at all is used. This only holds for the third person 
present; forms of ka- are used for all other persons and tenses. In its third person present form, 
ka- is only used to express existential meaning (allqu kan “there is a dog", not “x is a dog"), cf. 
Parker (1976: 68). The form ka-q used here, with the agentive suffix —q is not fulfilling a copular 
function here between tikranqayki “what you turn around" and tsugllu “corn”, instead it forms 
part of the complex noun phrase tsay [[tikranqayki] kaq], where tsay is a determiner and the 
contribution of kaq has been translated in English with the construction “the one. . .". For further 
information on kaq and other aspects of Huari Quechua syntax and information structure, see 
Bendezu Araujo (2021). 

270 https://osf.io/bgxzj/ 
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rise. Note that the syllable ti is neither the penult of the entire phrase nor that 
of the word. Pitch then basically remains eventlessly high over the course of the 
two words tikranqayki kaq until the initial /penultimate syllable of the final word 
tsuqllu, during which it begins to fall. Again the largest part of the pitch transition 
takes place over voiceless segments and by the time the vowel segment of the final 
syllable of the word and the entire utterance is reached, it has decreased substan- 
tially and then remains low. This utterance is exemplary for an alignment pattern 
in which the initial rise (both of the rising-falling and the rising contour) takes place 
not on a particular syllable in one of the words on which the contour is realized, 
but at a boundary between those words. I will call this the word boundary pattern. 
KP04’s Conc Q 1576 ((110)/Figure 127) shows that the initial low stretch can encom- 
pass more than one word also in this pattern. 


(110) KP04 Conc Q 1576 
tsay kay hawa-n ka-q wanupakush 
DEM.DIST DEM.PROX below-3 COP-AG burial 
“that one that's below this [is the] burial" 


DEM.DIST DEM PROX below-3 COP-AG barul 


tbar one that’s below this beral 


Figure 127: KP04 Conc Q 15767" (declarative with a rising and a rise-falling contour). 


There and in LC34's Conc Q 0110 ((111)/Figure 128), the word boundary pattern 
occurs on rising contours. 
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(111) LC34 Conc Q 0110 
primer qalla-na-n ka-q-chaw ashkash 
first | begin-DISC^7-3 COP-AG-LOC lamb 
“now the one starting the first one is the lamb” 


him [ee "ie died " tm it in 


E m b. " M J 


primer galla-na-n ka-q-chaw 


fine start-DISC-3 COP-AG-LUOC 


now the one starting in the fast ane 


Figure 128: LC34_Conc_Q_0110?” (part of a declarative with a rising contour). 


The fall in falling contours can also be aligned with a word boundary, as Figure 125 
above shows. However, often the falling pitch event stretches over both the penult 
of a word and a word boundary, so that it is difficult to determine which land- 
mark it actually aligns with (cf. Figures 122 and 123 above). The alignment pattern 
showcased in this section leads to the conclusion that a phonological phrase in the 
Huari Quechua data can contain several prosodic words. The level of the phrase 
determines the extent of the pitch contour, and, under this alignment pattern, the 
boundaries of the prosodic words are the landmarks that tonal alignment refers to. 
However, this is only one option attested in these data. 


272 The suffix -na glossed here as DISC(ontinuative) is sometimes used to indicate something like 
a change of topic, which the translation tries to approximate by the use of “now” as a sentence 
adverbial. In other cases its use can also be translated by “already” (cf. Parker 1976: 146). 
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6.1.3 Alignment with the phrase boundary and phrase-final devoicing 


Regarding tonal alignment with the phrase boundary, all contours seen so far show 
evidence for it in an uncontroversial way: the left-peripheral tones (the first L tone 
in rising and rising-falling contours, the H tone in only-falling contours) must be 
aligned with the left edge of the phonological phrase; the right-peripheral tones 
(the second L tone in only-falling and rising-falling contours, the H tone in rising 
contours) with the right. Here I'll show that also the point of tonal transition, so 
to speak the “inner” alignment, refers to the right phrase edge in some of the con- 
tours. Potential cases in point are abundant: in utterances with rising contours like 
(97)/(Figure 114), (101)/(Figure 118), or (103)/Figure 120, the observed pitch move- 
ment of a rise taking place on or after the penult of the last word in the phrase 
could well result from an alignment of both the L and the H tone to the right phrase 
boundary and tonal crowding ensuring that the L tone only reaches into the penult 
because the H tone already occupies the rightmost TBU. Similarly, in falling con- 
tours, the high pitch extending until the penult ofthe final word, like in (109)/Figure 
126, (110)/Figure 127 (rising-falling), or (112)/Figure 129 (only-falling), with the fall 
taking place on or after it can be analyzed by the L tone occupying the final TBU in 
the phrase and the H the preceding one, both seeking to align as rightwards in the 
phrase as possible. 


(112) ZR29 Conc Q 1661 
ya pitu ladu-n-chaw uusha 
ok flute side-3-LOC sheep 
*ok beside the whistle [is] the sheep" 


The obvious alternative analysis is that the H tone in both the rising and falling con- 
tours is aligned with the penult of the (final) word in these cases. Under that analy- 
sis, the fact that the (phrase-final) penult is roughly where the tonal transition takes 
place would be because it serves as H tone association or alignment site by virtue of 
being stressed (b in Figure 130), as opposed to because this is what happens when 
two tones in linear succession seek to be realized as rightmost as possible with the 
one to the right occupying the rightmost TBU and pushing the other TBU to the left 
(a in Figure 130). 

In the next section (6.1.4), evidence will be provided that something like anal- 
ysis b (also applied to rising and only-falling contours) must be right for some of 
the utterances in the data, whereas in this section I will proceed to show that for 
others, only a can hold. 
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mov 
M "ol 


EN 


ww] 
Í 


HOL) 


ya prr ladu-n<chaw unsha 


OK Whestle side-3-LOC sheep 


OK bosde the whisthe sheep 


Figure 129: ZR29 Conc Q 1661?" (declarative with a falling and a rising contour). 


([o c o o], looo] p}  ([coool,[ooo'o],]; 


Figure 130: Two possible analyses for schematized rising-falling contours. 


In this Quechua variety, vowels in phrase- and especially utterance-final sylla- 
bles are sometimes devoiced, as well as high vowels following voiceless fricatives 
or plosives (similar to what Delforge 2011 describes for Cuzco Quechua). When 
domain-final devoicing takes place, the phrase-final behaviour of the tonal contour 
is frequently not cut short, but instead occurs earlier, further “to the left”. This 
occurs in TP03 Cuent Q 1663 ((113)/Figure 131) and OA32 Cuent Q 1222 ((114)/ 


Figure 132). 
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(113) TP03 Cuent Q 1663 
maa cuenta-ri-shayki huk cuento-ta 
letssee telLITER-L.SUB»2.0BJ.FUT one story-OBJ 
cuenta-yaa-maa-nqa-n-ta 
tell-PL-1.0BJ-NMLZ-3-0BJ 
“lets see, I’m going to tell you a story they told"? me” 


M r ai 


rane wem í sð 

hi pow 1] i ~ | k 
a" ^ n 
8 ud Ini 


[m 


} { h) 


mas cuentareshayki hak vcwmon — | vaamany ymis 


maa cucnes-ri ahaks bk cuenti-ta cuenta- vaa-ma-nqa-©-ta 


Jets soe kelITER- 1 SUBS2 OBJEUT onc stocy-ORJ icli-PL- 10BI-NMLZ-3-«ORJ 


Vet's sec, l'en poing to tell you a story diat dbcy told ene 


B 
| 
| 
4 
| 
| 


Figure 131: TP03_Cuent_Q_1663?”° (declarative consisting of three falling contours). 


(114) OA32 Cuent Q 1222 
tsay mosca-ta miku-ski-r-ni-n-qa imanuy-taa 
DEM.DIST fly-OBJ — eat-ITER-SUBID-FON-3-TOP  how-DETVAR 
siente-ku-rqa-n-qa 
feel-MID-PST-3-TOP 
“when [he] ate the flies, how did [he] feel?” 


275 What has been translated as a relative clause here is a tenseless nominalization with -nqa 
(homophonous, but not identical, to the marker of the third person future —nqa), usable in both 
perfective and imperfect contexts. The verb form nominalized with -nqa usually does not bear 
tense markers (cf. Parker 1976: 172). 
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tsay mersca-ta miku-ski-r -ni-n-qa vmasery-aa sicnte-ku-rqs«n-qa 


DEM.DIST AWORI cal-ETER-SUBID-FON-3-TOP how: DETVAR feel MID-PST-3- TOP 


when he atc the flies, how did he feet? 


Figure 132: OA32 Cuent. Q 1222?" (wh-question with two rising and two falling contours). 


In TP03 Cuent Q 1663's cuentayaamanqanta, realized as [kven.tee.ja.mor.yen.tg], 
the last vowel (of the object suffix —ta) is produced without voicing. Yet there is 
no pitch peak or end of the plateau on the penult [yen]??? of the word, with the 
following fall cut short, as would be expected under an account where a H tone is 
associated or aligned with a supposedly prominent word penult. Instead, the peak 
is realized one syllable earlier, *displaced", as it were, onto the antepenult [mən], 
and followed by a full realization of the fall with a low elbow reached on the penult. 
Similarly, in OA32 Cuent Q 1222, sientekurqanqa, realized as [sion.to.ku.ro.yer).qe], 
with an epenthetic schwa inserted in the cluster /rq/, the peak occurs not on the 
word penult [yen], but on either [ku] or [ra], the preceding syllable(s), while it is 
the elbow of the following fall that is realized on the penult - followed by a (nearly) 
voiceless final syllable. A third example is QF16 MT Q 1282 ((115)/Figure 133), a 
single-word utterance realized with an only-falling contour. The last two syllables 
are voiceless (the penult realized with very creaky voice, the final syllable com- 
pletely devoiced), and the fall takes place on the preantepenult of the word, with 
the antepenult, the last voiced syllable, realized low. 


277 https://osf.io/pw2ad/ 

278 The breathy voice diacritics [] here and elsewhere indicate a vowel produced audibly with 
increased breathy air flow and reduced energy in the formant spectra. As here, a breathy vowel 
often occurs in a syllable preceding a fully unvoiced one, suggesting that both are due to a gradual- 
ly increasing domain-final exhalation process. 
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(115) QF16_MT_Q 1282 
hawa-n-pa-taaku 
below-3-GEN-NEG 
“not below it” 


vili ili TM Hili "m M. | l | 
4d umi M i MR AN M MI ME IN i N PIN AM | pore H ji Ni | V ^ Af Nath th 


Figure 133: QF16 MT Q 1282?” (declarative consisting of a falling contour). 


Such *displacement" is unexpected under an account where the penult is stressed 
at the level of the prosodic word, with the H tone as a post-lexical pitch accent 
anchored to it. In such a scenario, the final syllable being unvoiced should not affect 
the anchoring of the H tone to the supposedly prominent penult because the neces- 
sary condition for tonal association, namely the status of the penult as stressed, is 
determined previously to, or at least independently of, the unvoicing. Furthermore, 
the penult itself as stressed syllable should not be able to be devoiced at all, since 
deletion and other weakening processes in languages with stress usually target all 
other material instead of the stressed position. In Huari Spanish, a similar devoic- 
ing process occurs in the speech of some speakers, but there, affected pitch move- 
ment unambiguously shows the behaviour of pitch accents associated with stressed 
syllables, i.e. it is either less affected than on unstressed material or the associated 
pitch movement is deleted instead of displaced. See e.g. the utterance-final part of 
ZZ24’s ELQUD ES 26 (Figure 134). There, the utterance-final syllable [ka] of roca 
“rock” is also devoiced, but the preceding pitch peak due to the LH* pitch accent 
on the stressed syllable [ro] takes place on that syllable and is not “displaced” to 
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the left. On encima, the stressed vowel in [si] is devoiced due to the presence of the 
voiceless fricative.??* Here it is clearly visible from the remaining pitch contour 
that the rising movement due to the LH* pitch accent on that syllable is removed, 
instead of displaced. 


tul 


Nia | 
LN ú 1 
H { 1 


dog above the rock 


Figure 134: (part of) ZZ24 ELQUD ES 26"! (perro encima de la roca ‘dog above the rock’). 


The displacement is best explained via an account which makes no reference to 
a prominent position determined at the word level (a in Figure 130), and where 
instead alignment of tones with TBUs happens concurrently with devoicing: 
phrasal tones seek alignment on the rightmost available TBUs. Those seem to be 
vowel nuclei with crowding disallowed. Then, the processes of final devoicing 
and tone alignment can go in tandem: unvoicing makes a syllable unavailable as 
a TBU. Consequently, rightmost alignment will then align the two right tones with 
the penult and the antepenult, respectively. What looks like a *displaced accent" 
thus simply falls out from that. Whether the unvoicing or the tone assignment 
happens “first” is hard to determine. However, there are also occasionally some 
examples like HA30 MT Q 2044 ((116)/Figure 135), where the peak/end of the 
plateau and fall take place earlier even though no devoicing happens on the last 
syllable(s). 


280 Deletion of high stressed vowels due to voiceless context does not occur frequently in the 
Huari Spanish data. In languages without word stress, deletion processes more often affect posi- 
tions anchoring high tones: in Japanese e.g., syllables with lexical pitch accent (cf. section 3.3.2) are 
regularly devoiced if they have a high vowel and occur in the appropriate phonological context 
(Venditti et al. 2008: 480—481). 

281 https:;//osf.io/5sfym/ 


320 — 6 Huari Quechua 


(116) HA30_MT_Q_2044 
y tsay pitu-chaw pitu hita-raa-ykaa-nqa-n-pita 
and DEM.DIST whistle-LOC whistle throw-ITER-PROG-NMLZ-ABL 
“and in that whistle, from where the whistle is lying” 


and DEM DIST whsile-LOC whistle throw-ETER-PROG-NMLZ-3-ABL 


æd in that whistle rom where the winstie is vung 


Figure 135: HA30 MT Q 2044??? (parts of a declarative with two falling contours). 


The second phrase pitu hitaraaykaanqanpita, realized as [pi.ta.hi.te.réj.kan.gam. 
p^i.ta], is realized with an only-falling contour beginning with a high plateau that 
continues evenly (disregarding consonantal microprosody) until the syllable [kar], 
on which a fall begins that reaches its low elbow in the next syllable [gam]. Here, 
[kan] and [gam] are the preantepenultimate and the antepenultimate syllable in 
the phrase, respectively. A possible reason for this even earlier realization of the 
fall might be found in the penult, which is almost completely devoiced. It might be 
that the voicelessness of the penult causes the alignment of the two phrase-final 
tones to be realized on the two rightmost contiguous TBUs, even though the final 
syllable is again voiced and produced with considerable energy. An account using 
the word penult as the prominent position to which the high tone is associated 
would be at a total loss here.?9? For utterances like the ones shown in this section, 


282 https://osf.io/wj9cv/ 

283 One theoretical alternative would be to assume fixed but varying stress positions for different 
lexical items, perhaps in a syllable window of some size, as in Spanish. However, this cannot be the 
case, since on the one hand, variation in final peak alignment occurs even in the same lexical item 
across different utterances. On the other, as we have seen in these examples, Quechua words can be 
quite long and morphologically complex, with the amount of information conveyed by one of the 
longer ones perhaps closer to that of a sentence in a European language than that of a single word. 
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analysis a is clearly the only option. Further evidence against analysis b consists 
in cases where no final vowel devoicing occurs, but where instead a medial vowel 
is entirely eliminated, leading to resyllabification with the penult itself sometimes 
reduced. This happens in OA32's MT Q 2915 ((117)/Figure 136), where pinkullutam, 
produced with a falling contour, is realized as [pin.kul.tam], with the peak before 
the fall occurring on [kul], which is the penult after reduction, but at the cost of 
completely eliminating the vowel of what would have been the penult without 
reduction, the syllable /Au/ of /pin.ku.Au.tam/. 


(117) 0A32 MT Q 2915 
pinkullu-ta-m_rikaa-: 
flute-OBJ-ASS | see-1 
“I see [a] flute" 


iion ul it: iudi w 


DA n 
il 


pankallutam vikum 


pinkulh»ta-m rikan 


Tlute-OBI-ASS sec-1 


Figure 136: OA32 MT Q 2915?*' (declarative consisting of two falling contours). 


This syllable in this lexical item is not regularly reduced (cf. e.g. the realizations 
given in Figure 142), but there is no trace of the high back vowel following the 


To suggest that one particular instantiation of a given lexical root plus a certain chain of suffixes 
has a lexically fixed stress on the penult, another on the antepenult and yet another on the prean- 
tepenult seems exceedingly implausible, especially given that these word forms are the result of 
productive formation, and the rules for combining roots and suffixes allow a vast array of possible 
combinations, so that each individual combination of such length will have quite a low frequency 
of occurrence, making a lexicalized stress position for each of them improbable. Such a hypothesis 
has also never been put forth for any Quechua variety. 
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lateral here at all. This is incompatible with an account of word stress on the penult 
tweaking all acoustic parameters to cue this prominence, reducing atonic and pre- 
serving tonic syllables. Clearly a better hypothesis is that the reduction process is 
independent of word stress, with the penult just as available to be reduced as other 
syllables. Tonal alignment also does not make reference to stress and is affected by 
the segmental chain merely with regards to what qualifies as a TBU. 

Applying the difference between the two analyses to rising contours yields an 
interesting prediction. As Figure 137 shows, under the analysis which only assumes 
phrase edge-alignment (a), the penult of the final word is expected to be low and 
the rise to occur on or after it, because the H tone only occupies the rightmost TBU. 
Under an analysis assuming alignment of the H tone with the stressed penult there, 
it is expected to be high and the rise to begin earlier. 


|I m el: £ o E E að 
{lo c o c], [oo o oluta ([c o o o], [o c O'S) hy 


Figure 137: Two possible analyses for schematized rising contours. 


This difference is borne out by observation: utterances like (101)/Figure 118 realize 
a contour clearly more compatible with b, with the penult of the phrase-final 
word hawanchaw mostly high already. On the other hand, in utterances like QF16_ 
Conc_Q_1205 ((118)/Figure 138), the penult on the phrase-final word wayinchaw is 
clearly realized with a low stretch, and only the final syllable is high. 


(118) QF16_Conc_Q 1205 
runa ` wayi-n-chaw 
person house-3-LOC 
“a person in their house" 


For utterances like Figure 138, analysis a is clearly the more adequate option, 
demonstrating that a subset of the rising contours here exhibits a purely edge- 
based alignment. Yet as just seen, analysis b must also hold for a different subset, in 
which the (final) word penult is a relevant anchor for tonal alignment. The distinc- 
tion between these two variants is also upheld when considering phrases realized 
on individual words (see section 6.1.5). In the next section we will first consider 
further contours for which analysis b is more appropriate. 
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m iai a 


[f 1 


Mr bay i ú veð 


Figure 138: QF16 Conc Q 1205?* (two rising contours realized on part of a declarative). 


6.1.4 Alignment with the word penult 


The utterances discussed in this section, together with the arguments made just 
above, will present further evidence for the relevance of the penult of an individual 
word as a position with which the rise or the fall in the rising or falling contours 
is aligned. This solidifies the observation that in the Huari Quechua data overall, 
tonal alignment is possible with the edge ofthe phonological phrase, the edge ofthe 
prosodic word, and a regular word stress position. At least two of these alignment 
patterns are clearly in competition with each other, resulting in variant realiations. 

In some utterances the rise in the rising contour occurs on the penult of a pre-fi- 
nal word in a phrase, with the following high plateau extending until the end of the 
phrase, cf. aptasha kaykaan in XU31 Conc Q 0615 ((119)/Figure 139), or akakakuna 
niyaashqa in XQ33 Cuent, Q. 1302, ((120)/Figure 140). 


(119) XU31 Conc Q 0615 
runa ` bolsa-n apta-sha ka-ykaa-n ` escoba-pa —ladu-n-chaw 
person bag-3  carry-PRICP COP-PROG-3 broom-GEN side-3-LOC 
*a person is carrying a bag, besides the broom" 
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Figure 139: XU31 Conc, Q 061579 (declarative consisting of 5 rising contours). 


(120) XQ33 Cuent Q 1302 
hampi-ku-q-mi ^ aywa-ykaa- akaka-kuna ni-yaa-shqa 
heal-MID-AG-ASS go-PROG-1  woodpecker-PL say-PL-PST 
“Tam going to get myself healed’; [and] the woodpeckers said" 


These contours are less frequent than their counterparts where the low stretch 
extends until the penult of the final word in a phrase??? Yet they do occur, and they 
unambiguously require an analysis in which the word penult is a relevant position 
for tonal alignment in a phrase. The same can be applied to falling contours with 
the fall occurring on pre-final words. Some of them, like (108)/Figure 125, realize 
the fall between two words, using the word boundary as alignment landmark, 
while others like (107)/ Figure 124 realize it after the penult on a pre-final word. 
This latter realization is an instance of alignment with the word penult in falling 
contours, counterparts to the rises just discussed. Note that both phrases realizing 
the tonal transition on a prefinal word and those realizing it on a final word share 
the property that only a single tonal transition takes place in the phrase. That is to 
say, it is only the word penult of one word in the phrase that serves as tonal align- 
ment anchor. It seems logical to assume that this is the word that is most prominent, 
Le. only o' in w' can serve as alignment anchor, while all other o's of mere ws in Ó 
are ignored by tone alignment. 


286 https://osf.io/q83es/ 
287 In section 6.2.3.3, they will be argued to cue a marked metrical structure with prominence on 
a prefinal word in the phrase. 
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Figure 140: XQ33_Cuent_Q_13027 (two declaratives with one rising and one falling, and a rising 
contour, respectively). 


In this context it should be noted that the only observable effect of this word-level 
prominence is precisely the possibility for anchoring the tonal transition, and 
nothing else. As Buchholz & Reich (2018) showed, the word penult does not con- 
sistently attract higher pitch or longer duration, and several already seen examples 
demonstrate that vowel quality is also not more peripheral, nor spectral energy 
necessarily higher in that position either: e.g. Figure 140, where the syllable [ku] 
on which the rise takes place is shorter than the surrounding ones and has less 
spectral energy, or Figure 118 where the vowel of [wan] on which the rise takes 
place is more central than on surrounding syllables. Thus even in this alignment 
pattern where the tonal grammar makes reference to the metrical representation, 
the prominence on the penult is quite abstract. Its effects are certainly quite dif- 
ferent from what can be observed in “typical” word stress languages like English, 
where many more phonological processes cannot be adequately described without 
making reference to it (cf. Hyman 2014, section 3.3.2). In this respect, Quechua cer- 
tainly bears more resemblance to Japanese, where vowels bearing a lexical pitch 
accent are regularly devoiced when they are high (cf. Venditti et al. 2008: 480—481; 
Beckman 1986). But following Hyman (2014), it would be misleading to typologize 
Quechua as categorically different, e.g. by labeling it a *pitch-accent", a *stress- 
less" or an *edge-prominent" language. By adopting a *properties-based typology" 
(Hyman 2018) interested in whether more fine-grained individual properties are 
shared or not, the more informative assessment can be made that Huari Quechua 
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utterances with this alignment pattern share only one out of 8 features with proto- 
typical “stress-accent languages”, namely that “[l]exical stress provides the desig- 
nated terminal elements for the assignment of intonational tones (‘pitch-accents’)” 
(Hyman 2014: 58, cf. (10) in 3.3.2). Even this does not hold for utterances where 
tones seek alignment only with phrase or word boundaries. There is thus evidence 
for different variants in the intonational grammar available to the speakers pro- 
viding the Huari Quechua data here. They differ precisely with regards to whether 
the information about a prominent location at the word level is relevant to their 
description or not. Just as Huari Spanish, the Huari Quechua data does not behave 
uniformly with respect to this typology. 

We have now seen evidence that in a multi-word phrase, there can be three 
different kinds of landmarks serving as anchoring points for tonal transitions: the 
phrase-peripheral domain boundaries (in all variants), a phrase-internal bound- 
ary of the next lower domain, the prosodic word (in one alignment variant), and a 
regular prominent position, the penult of the strongest prosodic word in the phrase 
(in another alignment variant). Between the alignment variants, there is no reason 
to assume a differing tonal sequence or inventory going beyond LH for rising, HL 
for only-falling, and LHL for rising-falling contours. What differs between them is 
only where the transitions take place. 


6.1.5 Contours on single-word phrases 


In this section, the different intonational patterns as attested on individually phrased 
words will be discussed. Individually phrased here means that they were surrounded 
by (short) silent breaks on both sides and exhibited a recognizable complete tonal 
movement. That is not to say that individually phrased words are always surrounded 
by breaks, quite to the contrary. In section 6.2.3, only the presence of a recognizable 
tonal contour will be taken as sufficient evidence of phrasing, but in order to firmly 
establish this, the break criterion was used here to avoid ambiguities (in section 6.1.6 
itis also used for operationalization). 

Figures 141-146 provide examples of individually phrased Quechua words of 
increasing length, from two syllables (Figure 141), over three syllables (Figure 142) 
and four syllables (Figure 143 for rises and Figure 144 falls) to five or more syllables 
(Figure 145 for rises and Figure 146 for falls). The selection is somewhat shaped by 
necessity especially in the range of five syllables and more: those words are rarer 
than shorter ones, and individually phrased words by themselves are infrequent. 
Nevertheless, I believe they represent the range of attested intonational variation 
in the corpora, with the exception of the optional patterns particular to Spanish 
loanwords (see section 6.1.8). The following observations can be made. 
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A - rises Bisyllabic words 
1 1 3 
| Me 9 
(aee t er oo 
| lb / (d 
M E 2: LH? 
| ` 
kanan ,now' hara ,corn' kay-chaw ,here' 
KP03 Conc Q, 1372 XU31 Conc Q, 1477 QF16 Conc Q, 1737 
B - falls 
1 


[| E | 
wayi ,house' wayi ,house’; tsuklla ,hut‘ gillay money"; dolar ,dollar' 
OA32 Conc Q 0116 LC34 Conc Q 0384 XQ33 Conc Q, 0592 


Figure 141: Individually phrased bisyllabic Quechua words. Row A 1-329 words with a rising contour. 
Row B 1-3:2? words with a falling contour. A (i) and B (i) are schematized tonal representations of 
bisyllabic rising and falling phrases, respectively. 


Variability increases with syllable count: in phrases consisting of bisyllabic words, 
all rising contours consist of an L tone aligned with the first syllable and an H tone 
aligned with the second one and all falls are their mirror image in this respect, with 
the penult high and the final syllable low. This fact is reflected in there being only 
two schematic tonal representations, one for rises (A(i)) and one for falls (B(i)), for 
words of this size in Figure 141. This suggests that tonal alignment with phrase edges 
trumps alignment with the word penult under pressure of crowding; otherwise, we 
could expect to see high penults/initial syllables in some of the rising contours. It also 
suggests that rising-falling contours are not separately identifiable in phrases of this 
length. More variation is introduced beginning with phrases consisting of trisyllabic 
words. There are now two patterns for rises and falls each: in rising contours, the 
difference consists in whether the rise takes place before or after the penult, with 


289 A1 https://osf.io/yuztx/; A2 https://osf.io/jb68t/; A3 https://osf.io/zx4dp/ 
290 B1 https://osf.io/uks6e/; B2 https://osf.io/f4ycp/; B3 https://osf.io/2z5wq/ 
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A - rises Trisyllabic words 


pitsana ,broom' pinkullu ,flute* tsay-pita ,then' /,from L H $ 
ZR29 Conc Q, 1206 XU31 Conc Q, 0725 there' 
B - falls XU31 MT Q, 1855 (i) 
! i sam "A 
(0 c g, 
{ | 
o 
V pL 
: (i) 
o o og, 
ta-yaa-naa ,[they] lived’ ` pitsana ,broom* pinkullu ,flute* | HL le 
XQ33 Cuent Q 0775 SG15 Conc Q, 1414 OA32 Conc Q, 0500 


Figure 142: Individually phrased trisyllabic Quechua words. Row A 1-3: words with a rising contour, 
A1?' with the rise only on the final syllable (schematic representation in A(ii)), A2-3 with the rise 

on the penult and maintained height on final syllable (schematic representation in A(i)). A1-2 are 
monomorphemic, A3 is bimorphemic. Row B 1-3: words with a falling contour, B1?? with the initial 
syllable low and a rise to the penult (schematic representation in B (i)), B2-3 with both initial syllable 
and penult at a high level (schematic representation in B (ii)). B1 consists of three morphemes, B2-3 
are monomorphemic. 


the final syllable being always high and the penult thus varying between high or 
low, in parallel to what was observed in multi-word phrases. Pattern A(i) in Figure 
142, with the penult high, is the more frequent one in the corpora considered here. 
In falling contours, the difference consists in that between rising-falling contours, 
where the contours start low, realizing a rise before the fall (B(i)), and only-falling 
contours, which already begin high (B(ii)). In both patterns, the penult is high and 
the final syllable low. The only-falling contour seems to be somewhat more frequent 
than the rise-falling contour for individually phrased trisyllabic words.?9?? With an 


291 A1 https://osf.io/cqr9m/; A2 https://osf.io/u9ntz/; A3 https://osf.io/f98ph/ 
292 B1 https://osf.io/fsd43/; B2 https://osf.io/atdur/; B3 https://osf.io/cjf3s/ 
293 This relative frequency is reversed in phrases with more syllables, cf. section 6.2.3.2. 
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Tetrasyllabic words -rises 


terceru-chaw aywa-ykaa-nqa-n his/her going’ (6 o O O), 
,in the third’ { } 
AZ23 Conc Q 0122 AZ23_Cuent_Q_0958 L y? 
3! 


Mh 
RUN 
[ 


tsay-pita-na ,now then‘ kinray-ni-n-pa ,along in front of it'; 


2224 Cuent Q 1739 LC34 MT Q 2515 


Figure 143: Individually phrased tetrasyllabic Quechua words realized with a rising contour. 1: word 
beginning with a low stretch, with a rise and peak in the penult followed by high plateau maintained 
in the final syllable (schematic representation in (i)); 2: word beginning low, with most of the rise 
taking place already in the initial syllable, high plateau maintained until the end of the word/phrase 
(schematic representation in (ii)); 4: word beginning with a low stretch, with a rise and peak only 

in the final syllable (schematic representation in (iii)); 3: word where the rise begins already in the 
antepenult and continues into the final syllable (intermediate between all three representations). 

All words are multimorphemic.?? 


additional syllable, yet a further pattern each for the rises and the falls is attested. 
In individually phrased words of 4 syllables with a rising contour, there is now also 
a pattern in which (most of) the rise already takes place in the first syllable, and the 
rest of the word is realized with a high plateau ((ii) in Figure 143). There also seem 
to be intermediate realizations, with the rise taking place over several syllables. 
It was harder to find good examples of this early-rising pattern than for the other 
patterns. For the falling contours, there is also a third pattern, splitting the rise-falls 
in two: in the first, the initial low stretch extends over the first two syllables, so that 
only the penult is high ((i) in Figure 144), while in the second, the rise occurs already 


294 1 https://osf.io/cn9aj/; 2 https://osf.io/4bkp8/; 3 https://osf.io/cnw9x/; 4 https://osf.io/c4d2b/ 
295 There are only very few if any Quechua words with roots longer than three syllables. 
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Tetrasyllabic words -falls 


miku-yaa-naa tinku-ski-r-qa (when) meeting’ tinku-ri-yaa-naa (they) met’ 
(they) ate‘ 
XQ33_Cuent_Q_0876 HA30_Cuent_Q_0097 HA30_Cuent_Q_ 1044 
(i) (ii) 
== Yi Ne Á \ 
[o O O o), (o o o o), 
{ k i } 
L mut L HL? 
lá A 
wanu-paku-sh , burial’ ultimu-n-chaw — ushana-n-chaw Io o oo 
,in the last one’; „at the end‘ HL? 


KPO4. Conc Q, 1576 QF16 Conc Q, 0020 


Figure 144: Individually phrased tetrasyllabic Quechua words realized with a falling contour with 
time-aligned syllabic transcription. 1 and 3:7?" words beginning with a low stretch, with a rise 

and peak in the penult followed by a fall to the final syllable (schematic representation in (i)); 4 and 
possibly 6: words beginning low, with a rise reaching its peak already in the antepenult, plateau 
maintained until the end of the penult, fall to the final syllable (schematic representation in (ii)); 2 and 
5: words beginning with a high plateau, fall beginning in penult and falling to final syllable (schematic 
representation in (iii)). All words are multimorphemic.?9? 


296 1 https://osf.io/8kjng/; 2 https://osf.io/h3rjy/; 3 https://osf.io/ugahr/; 4 https://osf.io/s6mtg/; 5 & 6 
https://osf.io/v9epg/ 

297 Analyzing /rija:/ as monosyllabic [fja] with a complex onset in tinkuriyaanaa, which is clear- 
ly most faithful to the actual realization, goes against Quechua phonotactics described in Parker 
(1976: 55-56). If analyzed faithfully to the phonotactics as described, the word would be pentasyl- 
labic. 

298 With differing degrees of transparency. Wanupakush (also wanupakusha) is used in our 
data to refer to an image depicting a coffin surrounded by mourners, for which the same speak- 
ers use entierro ‘burial’ or funeral ‘funeral’ in Spanish. It is probably best analyzed as wanu- 
paku-sh (die-UNSPEC-PRTCP), containing the poorly understood -paku, itself possibly derived 
from —pU-kU, a combination which “is often understood as indicative of a commercial or pro- 
fessional activity, that is to say that the subject is beneficiary of an action directed at other per- 
sons" (Parker 1976: 119, my translation). It also possibly indicates an aspect of involuntariness 
in an action, with Carranza Romero (2003: 149) giving the meaning of the infitival verb form 
wanupakuy as *morírsele sin que pueda hacerse nada" [to die on someone without being able 
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Pentasyllabic and longer words - rises 


sv? H (6 0000), 
eukaliptuchaw ,in aywa-ykaa-nqa-n-chaw ,in his/her { ] 
the eucaliptus' going’ L yf 
XQ33 Cuent Q 0775 2224 Cuent Q, 1296 (i) 

(6 O O O O), 
( | 
L H t 
(iii) 
(00009]0., 
"m ber ra a A 
akaka-kuna-wan ,with the alma-yki-ta-raa ,still your soul L H 
woodpeckers’ (acc.)' 
HA30 Cuent Q 1044 XU31 Cuent Q 0648 


Figure 145: Individually phrased pentasyllabic and longer Quechua words realized with a rising 
contour??? 1 and 2: words beginning with a low stretch, with a rise starting just before and peaking in 
the penult followed by a high plateau maintained in the final syllable (schematic representation in (i); 
3: word beginning low, with most of the rise taking place already in the initial syllable, high plateau 
maintained until the end of the word/phrase (schematic representation in (ii); 4: word beginning with 
a low stretch, with a rise in the penult and peak only in the final syllable (schematic representation in 
(iii)). All words are multimorphemic. 


to do something] and *morírsele, ser responsable de la muerte de un animal o persona" [to die 
on someone, to be responsible for the death of an animal or person], but that of ishpa-paku-y 
(urinate-UNSPEC-INF) as *to urinate on oneself", tushu-paku-y (dance-UNSPEC-INF) as *to dance 
expecting thanks or payment" and miku-paku-y (eat-UNSPEC-INF) as *to eat at someone's disad- 
vantage", i.e. to scrounge (all my translations). All of this points at a complex valency change 
meaning, whose precise workings remain unclear. The change from base verb meaning to de- 
rived verb meaning does not seem to be transparently the same between different verbs, indi- 
cating historical changes leading to lexicalization. It is unclear how productive this morpheme 
is. There is a further semantic stretch from the given meanings of the derived verb wanu-paku- 
to the attested meaning ‘burial’ after the addition of the participle -sh (also -sha or -shqa). In 
comparison, all the other words in Figure 144 are transparent results of productive morphology. 
Ihave not found any indication that morphological composition has an influence on intonation 
in the data here. 

299 1 https://osf.io/fe78n/; 2 https://osf.io/rqj7e/; 3 https://osf.io/t98v2/; 4 https://osf.io/8d4jt/ 
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Pentasyllabic and longer words - falls 


4 
| 
| 
| 


viði 
" || i) c aT 
tsaka-pa-ku-ski-naa rigi-tsi-naa-paa ,in order to let (a a ooo) 
,it got dark’ know’ { L 
2224 Cuent Q 1296 2224 Cuent Q 1783 L HL? 


tsaka-pa-ku-ri-naa ,it got manka-pita-qa ,from the pot’; { N 
dark' Hus 
QF16 Cuent Q, 0154 OA32 MT Q, 4950 


Figure 146: Individually phrased pentasyllabic and longer Quechua words realized with a falling 
contour.” 1 and 2: words beginning with a low stretch, with a rise in or before and a peak in the 
penult followed by a fall to the final syllable (schematic representation in (i)); 4: word beginning low, 
with a substantial part of the rise taking place already in the initial syllable, high plateau maintained 
until the end of or after the penult, fall to the final syllable (schematic representation in (ii)); 3: word 
beginning with a high plateau, fall beginning in penult and falling to final syllable (schematic 
representation in (iii)). All words are multimorphemic. 


on the initial syllable, creating a high plateau-like realization on both penult and 
antepenult before the final fall ((ii) in Figure 144). Both patterns can be said to “con- 
tinue" the trisyllabic rise-fall to equal degree, each preserving only parts of it. The 
plateau-forming rise-fall pattern seems to be less frequent than the other two. These 
three patterns persist also in individually phrased words of five or more syllables 
(Figures 145, 146), where some intermediate realizations are also found. The up to 
three variant patterns can be separated according to whether they need to make 
reference to the word penult as prominent position or not: of the three rising pat- 
terns, only the one which starts low and then rises so that both penult and final 
syllable are high ((i) in Figures 143 and 145) needs to make reference to such a posi- 
tion. Of the three falling patterns, none actually do in single-word phrases, because 


300 1 https://osf.io/qxm84/; 2 https://osf.io/asqex/; 3 https://osf.io/ga634/; 4 https://osf.io/efqnp/ 


6.1 Description of Huari Quechua tonal contours === 333 


rightmost alignment of the two right tones will also always generate the attested 
contours. Since alignment with the word penult is clearly an active pattern in the 
data overall (cf. previous section), it can be assumed that it underlies some of the 
contours on single-word phrases as well, leaving them ambiguous. 

Despite the variability, all attested patterns can be modeled using only three 
tonal configurations: LH for the rising contour, and LHL and HL for two falling 
contours, the rise-falling and only-falling contour, respectively. Comparing phrases 
of individual words with those composed of several ones reveals that there are no 
pitch events that pertain specifically to the level of individual words such as pitch 
accents — effectively, phrases consisting of several words behave as if they consisted 
of very long single words, except that they have the additional option of anchoring 
the tonal event of a rise or fall with one of the boundaries between the individual 
words they are composed of. Increasing the number of words per phrase does not 
increase the number of tones or pitch events, just the number of possible locations 
at which the tonal transitions may take place. This confirms the more specific pitch- 
based criterion for phrasing: only sequences on which either of these contours is 
fully realized should be taken to realize complete phonological phrases. 

Schematic tonal representations for a few utterances that were previously dis- 
cussed (see (121)-(125)) are here provided to illustrate the insights gained so far. 
They use square brackets for the delimitation of phonological phrases (PhPs), which 
are taken to be the units at which tones are assigned, just as in Huari Spanish. The 
assumption here is that rising contours consist of an LH tone sequence, only-falling 
contours of an HL tone sequence and rising-falling contours of an LHL sequence. 
The leftmost tone always minimally occupies the leftmost TBU. There is also a 
phrase-final H tone, which minimally occupies the rightmost TBU. The rightmost 
tone always minimally occupies the rightmost TBU. In rising-falling contours, 
this leaves the H tone in between these two peripheral tones, exhibiting varying 
alignment according to the patterns described. All tones seek to satisfy multiple 
alignment constraints in opposing directions, symbolized via dashed arrows. 
This creates the observed low and high stretches. Their interaction, discussed in 
section 6.3, creates the attested differences in the location of the tonal transition. 
Only where a word boundary occurs medially within a phonological phrase, a sub- 
scripted right bracket ;, is inserted. A penult is marked with an accent mark () 
and a line connects it with the H tone in the tonal tier only in the case that the 
information of that syllable being prominent is necessary to determine the shape 
of the contour. In accordance with the argumentation in section 6.1.3, this is the 
case when analysis b from Figures 130 or 137 is more appropriate than analysis a, 
also e.g. when the penult is already high in rising contours realized on more than 
two syllables. At this point, the analysis can remain agnostic regarding whether the 
H tone associates in these positions, since only alignment is necessary to generate 


334 —— 6 Huari Quechua 


the observed contour shapes. When the vowel of a syllable is unvoiced it is shaded 
in grey and cannot serve as TBU (cf. (122)). The details of constraint interaction 
generating these contours will be dealt with in the OT-analysis (section 6.3). The 
main point here is to show that using these contour variants produced entirely 
from tones assigned at the phonological phrase level, a sufficient tonal representa- 
tion of the types of utterances encountered so far can be produced, leaving issues of 
information structural- or syntax-dependent phrasing aside. In particular, nothing 
indicates the need for potential higher-level (=IP) boundary tones: both falling and 
rising contours appear utterance-finally as well as medially without any difference 
in their shape apart from tone height. Utterance-final tonal movements, whether 
high or low, often have greater local excursion than medial ones. IP-final tones thus 
seem to only ever be copies of the final tone in the final PhP, likely obviating the 
need for such higher-level tones and possibly also non-recursive instantiations of 
higher prosodic levels here. 


(121) XU31 MT Q 1856 (cf. (104) and Figure 121) 
[tsæj ‘pi tæ]pnp [ul tu te]pw [ri 'kæn Kilpnp [jæ ku ‘pr tae] ppp [jar ga mu ‘fe talpnp 


| | | | 
Lf LHL LH Iesn Ia > <—H 


(122) TP03 Cuent Q 1663 (cf. (113) and Figure 131) 
[mao kwen te rr Jeax]pnp [hu?,, kwen tu talppp [kwen tæ ja mor) yen ta]pnp 
L-------------- > H L L H L L------------- >H L 


(123) XQ33_Cuent_Q_1302 (cf. (120) and Figure 140) 
[ham pr 'koy me] [Ej wj Kalpnp [2a ka ka ‘ku nay, ni jæf galpnp 


(124) KP04 Conc Q 1576 (cf. (110) and Figure 127) 
[tseb,,, ke}, Xa wamy;, kafi]pyp [wa nu pa kyf]pnp 


—— H L <-H L 


(125) HA30 Conc Q 0211 (cf. (108) and Figure 125) 
[wa fel» [gen ‘te tfulonr [20 kuf mi, kæj knlm» 


L H L eH Lh «——L 
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6.1.6 Peak alignment in rise-falling contours in Huari Quechua and Spanish 


In this section,??! I provide quantitative results on peak alignment in similar-look- 
ing contours, the rise-falling contour of Quechua and pitch accents on paroxytone 
words followed by a low boundary tone in Spanish. In Huari Quechua, when the 
peak occurs phrase-finally, it is sometimes realized within the penult of the last 
word (cf. Figure 148), and sometimes later, in the final syllable (cf. Figure 147),?? 
even on the same lexical item. In contrast, in Huari Spanish, pitch accent peaks 
nearly always align within the stressed syllable (cf. section 5.1.1.2, also Figure 149). 
We saw that some Quechua contours are ambiguous between two analyses, one in 
which tones are only aligned with prosodic edges, and another where the H tone 
is aligned with the word penult. The latter analysis is apt for cases in which the 
peak here is aligned within the penult, but it can hardly account for those where 
it is aligned in the final syllable. I made the assumption that in the Quechua data 
overall, (at least) these two variant alignment patterns are both active. The main 
question of this section is whether this holds up under quantified measurements 
via a comparison to the Spanish data. If the results show that it does, this confirms 
the suggestion that the H tone in the Quechua contours, unlike the H of the Spanish 
LH* pitch accent, is not always aligned or associated with the word penult, but 
instead sometimes aligned with the right edge of the phonological phrase (cf. the 
discussions in sections 3.3.3, 6.1.3). 

The data considered here come from the Quien corpus, which was not used in 
the preceding sections mainly based on data from the Conc, Cuento, and Maptask 
corpora. For all utterances from Quechua and Spanish Quien, word and syllable 
boundaries were annotated and labelled manually in praat. A script (cf. Appendix 
C) was written to extract labels and durational measurements from these anno- 
tations, and to create pitch objects from which f0 measurements were extracted. 
Each pitch object created was inspected manually and perturbances due to conson- 
tal microprosody were removed so that they would not influence the results. Only 
polysyllabic words were included in the analysis. Words were excluded when they 
were fragmentary, insufficiently voiced, or when the overall pitch range within 
the last two syllables did not exceed 7 Hz.3% Table 20 gives the number of words 
remaining after this first process of elimination. 


301 Parts of the analysis in this section was presented as a poster at Pape 21 (Buchholz 2021b). 
302 Occasionally it is even realized in the antepenultimate or preantepenultimate, as seen in Fig- 
ure 131 and Figure 132, but these cases are not considered here. Their existence is however in itself 
an argument for the lack of a strict H tone alignment with the putatively stressed word penult in 
the Quechua data. 

303 See note 98 in section 5.1.1.2 for why this threshold was chosen. 
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H 
with what docs (11) bogim 


Figure 148: CJ35 Quien Q 1351? (wh-question). 


To operationalize only rising-falling contours for the analysis here, only words were 
considered whose f0 maximum position within the final two syllables preceded 
the fü minimum position, i.e. which had an overall negative slope within the last 
two words. This was considered appropriate because after inspection of the data, 
complex pitch movements with more than one relevant peak within the last two syl- 
lables and final low pitch were not found, as in the majority of the Huari Quechua 


304 https://osf.io/mxd5q/ 
305 https://osf.io/veh3q/ 
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INNAN 


d mil i pel m — 
il li ll M ji l a ja ai st TM 
í Wb. iil n bli M | Nl i 


FO H2) 


Figure 149: 2224 Quien ES 085179 (declarative). 
and Spanish data. This leaves 254 Quechua words, 143 paroxytone Spanish words, 
and 35 oxytone Spanish words. Proparoxytones were excluded because of their low 


number. 


Table 20: Words from Quechua and Spanish Quien sorted according to length. 


word length Quechua Spanish 

proparoxytone paroxytone oxytone 
2 syllables 124 (34.7%) - 143 (56.896) 63 (85.1%) 
3 syllables 122 (34.2%) 2 (100%) 77 (30.696) 9 (12.2%) 
4 syllables 80 (22.4%) - 29 (11.596) 2 (2.796) 
5 syllables 21 (5.996) - 3 (1.296) - 
6 syllables 9 (2.5296) - - - 
7 syllables 1 (0.3%) - - - 
Total 357 2 252 74 
number of syllables — 1100 6 630 161 


Figure 150 visualizes the distribution of the fü maximum position values in the 
final two syllables for Quechua words and Spanish paroxytone and oxytone words. 
The combined violin- and boxplot shows that in Spanish paroxytones, the great 
majority of peaks is located in the word penult, while a substantially larger portion 
of them is located in the final syllable in Spanish oxytones. For Quechua words, the 
distribution is broadest, with the majority of the values within the penult, but a 
number of them reaching also far into the final syllable. 


306 https://osf.io/rb6wf/ 
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Peak position from end of penult, final falls 


Spanish_ox 
Spanish_parox - i . . 9 * » 
Quechua a E B 
-0.3 -0.2 -0.1 0.0 0.1 0.2 


Peak position from end of penult(seconds) 


Figure 150: Final peak position in the Quechua and Spanish Quien words with a fall in the final two 
Syllables. Combined violin- and boxplot for Spanish paroxytone, Spanish oxytone, and Quechua 
words. 


The barplot in Figure 151 gives the share of peaks that are located within the penult 
(green) and the final syllable (red) for the three groups. For Spanish paroxytones, 
in 9 out of 143 words (6.3%) the fü maximum is located in the final syllable, for 
Spanish oxytones it is 13 out of 35 (37.196), and for Quechua words it is 41 out of 
254 (16.596). That only a minority of the peaks in Spanish oxytones occurs in the 
final syllable is perhaps surprising given the observations on peak placement in 
the Spanish chapter. 

Most relevant here is the difference between Quechua and Spanish parox- 
ytones given that Quechua words are also putatively stressed on the penult. A 
X*-Test on the Quechua and Spanish paroxytone data yields a significant result 
(X2(1)=7.19, p=0.007), suggesting that the observed difference in peak location is 
indeed associated with the difference between Spanish paroxytone and Quechua 
words. However, a further differentiation of the data is necessary. Since the aim 
is to especially to compare the similar-appearing phrase-final contours between 
Quechua and paroxytone Spanish, the words were differentiated further accord- 
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Peak location across Quechua and Spanish, final falls 
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$ 50 H final 
$ H penult 
à 


25- 
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Figure 151: Final peak position in the Quechua and Spanish Quien words with a fall in the final two 
Syllables. Barplot showing location of peak in either penult of final syllable in Spanish paroxytone, 
Spanish oxytone, and Quechua words. 


ing to whether they were phrase-final or not, operationalized by whether they 
were followed by a pause or not.??7 

Theleft combined violin- and boxplot in Figure 152 gives the peak position values 
for Quechua and paroxytone Spanish for phrase-final and non-phrase-final words 
separately. The distributions appear to be somewhat different, with the medians 
located further towards the final syllable in the nonfinal words, but with the final 
Quechua words still showing the broadest distributional range. Linear regression 
models were used to further explore this. The goal was to find a model which best 
predicts the position of the f0 maximum in the final two syllables in phrase-final 


307 Pauses as indicators of phrase boundaries are somewhat problematic, as is pointed out oc- 
casionally throughout this work. They were chosen because they are easy to detect automatically 
and in order not to base the phrase boundary criterion on final pitch movement when it is the 
aim of the analysis to say something about final pitch movement. Separated into phrase-final and 
non-phrase-final words, the counts for Quechua are 113 (phrase-final), 142 (non-phrase-final); for 
paroxytone Spanish 56 (phrase-final), 87 (non-phrase-final). 
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Peak position from end of penult, final falls 


Spanish parox nonfinal < : E 
Spanish parox, final [ | i : -— = 
Quechua_nonfinal : : 
Quechua final | : * -— ` 
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Peak position from end of penult(seconds) 


Figure 152: Final peak position in the Quechua and Spanish Quien words with a fall in the final 
two syllables. Combined violin- and boxplot for Spanish paroxytone and Quechua words in final vs. 
non-final position in the phrase. 


words with a final fall. The predictor variables were length of the penult, length of the 
final syllable, and position of the minimum f0 value within the final two syllables. The 
dependent variable of the fü maximum position could in itself be measured in three 
different ways: from the beginning of the penult, from its end, and from the end of 
the final syllable. Combinations of predictor variables and measurement reference 
points represent potential alignment scenarios. For the Spanish paroxytone words, 
the best model (R?-0.76) turned out to be one in which the fü maximum position 
was measured from the end of the final syllable with the predictor variable of the 
length of the final syllable.” The model results effectively mean that the final peak 
is on average located within the penult, at a fixed position relative to its end. This 
holds to a similar extent for words in phrase-final and non-phrase-final position, 
explaining 69—7696 of the variance in the data. Following the discussion in section 
3.5, this can be taken as evidence that the tone responsible for it, the H of the LH* 


308 These are the model results for paroxytone Spanish words with a final fall in phrase-final 
position measuring from the end of the final syllable: 
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pitch accent, is associated with the stressed penult in paroxytones, as expected for 
Spanish. For Quechua, the same model applied to phrase-final words does not yield 
good results, explaining only about 30% of the variance (R2-0.29). Instead, the most 
promising models use either all three predictor variables to predict peak position 
from the end of the final syllable,*” or the length of the penult and the position of 


estimate st.errorB t-value p-value 


intercept 0.16 0.02 7.56 «0.001 *** 
length of the final syllable 0.95 0.07 131 «0.001 *** 
F(1, 54)=171.6, p «0.001 

R?= 0.76 


Another model using both the length of the final syllable and that of the penult as predictors 
reached a slightly better adjusted R? of 0.77, but with a diminished F(2, 53) of 94.16. Because model 
fit always slightly increases with more predictors and because the results of the simple model are 
easier to interpret, I discuss only those. 

The results of the same simple model applied to non-phrase-final words are as follows: 


estimate ß st.errorB t-value p-value 


intercept 0.11 0.01 8.02 <0.001 *** 
length of the final syllable 0.93 0.06 137 «0.001 *** 
F(1, 85)=189.8, p «0.001 

R?- 0.69 


309 The model results for Quechua words with a final fall in phrase-final position using all three 
predictors and measuring from the end of the final syllable are: 


estimateB  st.errorg t-value p-value 


intercept 0.01 0.03 0.3 0.76 
length of the penult 0.77 0.11 7.14 «0.001 *** 
length of the final syllable 0.67 0.07 9.4 «0.001 *** 
position of the minimum 0.29 0.1 2.8 _ 0.007 ** 
F(3, 109)=43.28, p < 0.001 

Adj. R2=0.53 


Because there are some outliers in the Quechua data that cannot be excluded on reasonable 
grounds, a robust regression model using rlm was also fitted. It slightly changes the standard er- 
rors compared to the standard regression but leaves all predictor variables significant. Because my 
main aim here is to compare the Quechua data with that of the Spanish paroxytones, I'm sticking 
with the normal regression results for better comparison. The position of the minimum in the 
model is measured from the same position as the maximum. The minimum position values them- 
selves are represented moderately well with models that either take the length of the penult and of 
the final syllable as predictors (adj. R?-0.58), measuring from the beginning of the penult, or only 
the length of the final syllable, measuring from the end of the penult (R?-0.57), compatible with the 
idea that the minimum is the realization of an L tone seeking proximity to the phrase boundary. 
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the fü minimum in the final two syllables to predict it from the end of the penult.??? 
The amount of variance in the data they explain only amounts to 5396, more than 
20% less than for the Spanish paroxytones. Their results are also more difficult to 
interpret. What they broadly amount to is the statement that the peak position in the 
final two syllables in Quechua is somehow influenced by their duration, but even 
that still leaves a lot of variation. The same models also fare worse when applied to 
words not in phrase-final position.*" Crucially, the characteristic of the Spanish par- 
oxytones that they realize the peak at a relative distance to some landmark within 
the penult, which is well captured by the model for those data, does not seem to hold 
well for the Quechua data that show a much broader range of variation, confirming 
the impression from the individual inspection of examples. This absence of an iden- 
tifiable landmark within the final two syllables that the peaks in Quechua are on the 
whole orientated relative to suggests that there is more than one tonal alignment 
pattern underlying the data here, at least one of which does not refer to the puta- 
tively stressed penult (or that tonal alignment is in general much more variable). 


310 The model results for Quechua words with a final fall in phrase-final position using length of 
the penult and f0 minimum position, measuring from the end of the penult, are: 


estimate B st.errorB t-value p-value 


Intercept 0.02 0.03 0.9 0.35 


length of penult -0.86 0.1 -8.6  «0.001*** 
position of the minimum 0.27 0.06 4.74 «0.001 *** 
F(2, 110)=64.36, p < 0.001 

Adj. R2-0.53 


311 These are the results for the same models applied to Quechua words in non-phrase-final position: 


estimate B st.errorB t-value p-value 


from the end of the final syllable 


intercept 0.01 0.03 0.3 0.77 
length of the penult 0.57 0.11 5.01 «0.001 *** 
length of the final syllable 0.72 0.08 8.9 «0.001 *** 
position of the minimum 0.38 0.11 3.55 «0.001 *** 
F(3, 138)-43.6, p < 0.001 

Adj. R2-0.48 

from the end of the penult 

Intercept -0.03 0.02 -1.27 0.21 
length of penult -0.52 0.11 -4.83 «0.001 *** 
position of the minimum 0.33 0.07 443 «0.001 *** 


F(2, 139)=22.55, p < 0.001 
Adj. R2-0.23 
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We explore one further possible caveat. Huari Quechua has contrastive vowel 
length. This raises the possibility that differences in syllable structure might influ- 
ence peak placement in the Quechua data.? Table 21 gives the structure of the 
final syllable (light or heavy) compared to the location of the f0 maximum in the 
final two syllables (in the penult or the final syllable) in all Quechua words with a 
final fall and only in those that are phrase-final. There are two options for which 
syllables could be counted as heavy: only those with a long vowel, or also those 
with a consonantal coda (both have been suggested in the literature). For none of 
the options did X?-tests produce significant results,'? suggesting that heavy final 
syllables, whether only long or also closed, are not simply associated with a peak in 
the final syllable instead of the penult. 


Table 21: Distribution of peak location in the final two syllables against 
Syllable structure of the final syllable in Quechua words with a final fall. 


only syllables with long vowels ` closed syllables and those with 
counted as heavy long vowels counted as heavy 


Words with a final fall in both final and non-final position 


Final syllable Peak location Final syllable Peak location 

penult final penult final 
light (CV, CVC) 188 34 light (CV) 132 23 
heavy (CV:) 26 7 heavy (CV: & CVC) 82 18 
Words with a final fall in phrase-final position 

penult final penult final 
light (CV, CVC) 94 15 light (CV) 69 10 
heavy (CV:) 2 2 heavy (CV: & CVC) 27 7 


However, if the syllable structure of both final syllables (penult and final syllable) is 
considered in tandem, an observable effect emerges. Figure 153 shows that the dis- 
tribution of final peaks in phrase-final Quechua words with a final fall reaches far 
further into the final syllable in words with a “right-heavy” final syllable structure, 
i.e. where the penult is light, but the final syllable is heavy ((C)VC or (Q)V:), than in 


312 This would be expected under Stewart (1984)’s account of Conchucos Quechua stress, and 
under Parker (1976)’s account of the stress system of some Ancash Quechua varieties. In contrast, 
Hintz (2006) claims that South Conchucos Quechua stress is not sensitive to quantity. Cf. the discus- 
sion in section 3.3.3. 

313 In the case of phrase-final words with only syllables with long vowels counted as heavy, the 
expected counts in two cells were <5, rendering the X?-result unreliable. To remedy this, a Fisher's 
exact test, which does not have that problem (Field et al. 2012: 816), was also executed on the 
counts. It also did not produce a significant result. 


344 —— 6 Huari Quechua 


Peak position from end of penult in phrase-final 
Quechua words with a final fall 


(C)V.(C)VC and 
(C)V.(C)V: 
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Figure 153: Final peak position in phrase-final Quechua Quien words with a fall in the final two 
Syllables, sorted according to syllable structure of the final two syllables. 


words with other final syllable structures. In a regression model, this effect is sig- 
nificant,” but it does not obtain with words that are not phrase-final. This is some 


314 There are the model results: 


estimateB  st.errorg t-value p-value 


intercept (=(C)V.(C)V) -0.11 0.01 -7.39 «0.001 *** 
heavy.heavy -0.05 0.03 -1.86 0.06 
(OVC.(OV -0.07 0.02 -2.87 0.005 ** 
(OV:.(OV -0.01 0.02 -0.54 0.59 
light.heavy 0.08 0.02 3.4 «0.001 *** 


F(4, 108)-9.35, p «0.001 
Adj. R2= 0.23 
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indication that the relative syllable weight between the two final syllables at the 
end of a phrase has an effect on peak position. The model only explains about 23% 
of the variation (adj. R?=0.23), however, so that this can only be part of the story, 
perhaps describing only one alignment variant active in the sample. If the categor- 
ical predictor variable of final syllable structure is replaced with the continuous 
predictor variable ofthe difference in duration between penult and final syllable, a 
similar effect is observed and such a model actually explains a greater share of the 
variation (adj. R?=0.38).3 This means that peaks move further rightwards relative 
to the boundary between the final two syllables the more the length difference 
between them tilts in favour of the final syllable. Plausibly, some of this is due to a 
complex effect of syllable structure,?5 but more than that it suggests that the peak 
simply seeks to be not too far away from the right phrase boundary. Still, more than 
half of the variation remains unexplained in this way. 

For some languages, it has been found that early and late peak alignment in 
very similar-looking rising-falling contours differentiates between declaratives and 


They indicate that right-heavy final syllable structures move the final peak significantly to the 
right. 

The counts for the groups are: 35 ((C)V.(C)V); 11 ((C)VC.(C)VO); 25 ((C)VC.(C)V); 19 ((C)V:.(C)V); 19 
((C)V.(C)VC); 3 ((C)V.(C)V:); 1 ((C)VC.(C)V:). The categories (C)VC.(C)V: and (C)VC.(C)VC were grouped 
together as “heavy.heavy”, and (C)V.(C)V: together with (C)V.(C)VC as "light.heavy" because of the 
low counts. A robust model using rlm was also fitted, with the same groups making significant con- 
tributions. Applying the model to non-phrase-final words did not yield significant results. 

315 These are the model results: 


estimate ß st.errorB t-value p-value 


intercept -0.14 0.01 -17.54 «0.001 *** 
length difference between -0.46 0.06 -8.23 «0.001 *** 
penult and final syllable 

F(1, 111)=67.69, p «0.001 

R= 0.38 


The predictor variable was calculated by subtracting the length of the final syllable from the length 
ofthe penult, i.e. positive values indicate a relatively longer penult, and negative values a relatively 
longer final syllable. The same model applied to non-phrase-final words only has an R? of 0.14 (F(1, 
140)-23.29, p«0.001). 

316 The effect of syllable structure should not simply be dismissed because it also has traces in 
Huari Spanish. In the audio stimuli for Spanish Cuento, the oxytone colibri “hummingbird” oc- 
curred. For some speakers, this was a new lexical item, because they use picaflor instead. When 
they reproduced it in the retelling, they produced it variably as [koli brin] or [kule brin], with a 
closed final syllable. This could suggest an unmarked preference for oxytones with a heavy final 
syllable. The effect is subtle, however, since it does not occur with oxytones with light final sylla- 
bles the speakers have lexicalized, such as maní *peanut". 
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polar questions, with the peak in polar questions aligning later than in declaratives 
e.g. in Neapolitan Italian (D'Imperio & House 1997) and Tashlhiyt Berber (Roettger 
2017, cf. Roettger 2017: 68 for a discussion of further cases). Because the data here 
come from both declaratives and questions, the difference in peak alignment could 
be related to that. However, no significant difference could be found in peak align- 
ment between declaratives and polar questions," excluding the possibility that 
this difference in utterance type is responsible for the observed variation in the 
Quechua data. 


Length of penult and final syllable in phrase-final Quechua words 
with a final fall dependent on peak position 


final final peak » 


final penult peak = | ] : —— . 


penult final peak 1 | f 


penult penult, peak- — —————— s 


0.2 0.4 0.6 
Length(seconds) 


Figure 154: Length of the final two syllables in Quechua phrase-final words with a final fall sorted 
according to whether the peak is realized on the penult or the final syllable. "final final peak" means 
"final syllables in words with the peak on the final syllable", *final penult peak" means "final syllables 
in words with the peak on the penult", and so on. The peak is located on the penult in 96 words, and 
in 17 on the final syllable. 


317 Wilcoxon rank-sum tests were executed on the difference in peak position between phrase-fi- 
nal words in declaratives and all other utterance types (neutral polar questions, biased polar 
questions, wh-questions) and on that between all types of polar questions vs other utterance types 
(declaratives and wh-questions), measuring peak position from the beginning and end of the pe- 
nult, and from the end of the final syllable. No results were significant. 
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Finally, Roettger (2017: 47, 144) suggests that it is an indication for the association 
of a H tone if the syllable on which it is realized is longer than one on which it isn't 
realized, a process called *phonetic enhancement" (cf. section 3.5.3). Figure 154 
shows the length of the penult and the final syllable, respectively, in phrase-final 
Quechua words with a final fall, sorted according to whether the fü maximum in 
the last two syllables is located in the penult or the final syllable. It shows clearly 
that syllable length increases on average when the peak is located on that syllable. 
This holds for both the penult and the final syllable, and is independent of the fact 
that the final syllable is longer than the penult on average (final lengthening). The 
effect obtains for all Quechua words, not just phrase-final ones, and it is statisti- 
cally significant." The observed enhancement effect for Quechua words refers to 
syllable length depending on peak position, which is not the same as stress-induced 
lengthening.?? For the putatively stressed word penult in Huari Quechua, Buch- 
holz & Reich (2018) showed that it is not longer on average than surrounding syl- 
lables. Word stress position should not differ between individual instantiations of 
the same lexical item in the same prosodic context. Yet peak alignment does differ 
between individual instantiations of the same lexical item in final position here, as 
the comparison between examples like Figures 147 and 148 shows. Thus, the effect 
of lengthening according to peak position here is independent of stress. Most of the 
results in this section do not in fact particularly support an account whereby the H 
tone associates with the penult because it is (putatively) stressed. However, follow- 
ing Roettger (2017), the observed lengthening effect means that the H tone respon- 
sible for the peak does associate with the TBU on which it is realized. This would 
suggest that in the Quechua rising-falling contours observed here, at least some of 
the H tones associate, but not with a specific landmark within either of the two final 
syllables, and the property of stress is not relevant to this association. The phonetic 
enhancement effect together with the earlier observed effect that relatively longer 
or heavier syllables in the final two-syllable window seem to attract peak place- 
ment could also explain a puzzling finding in Buchholz & Reich (2018: 153-155): 
there it was found (based on other data than here) that CVC penults are longer 
than surrounding syllables to a higher degree than CVC syllables in other positions, 
while CV penults are shorter than surrounding syllables to a higher degree than CV 
syllables in other positions. This might be, at least partially, because the CVC penults 


318 A Wilcoxon rank-sum test on the length of the penult depending on final peak positions for 
phrase-final words with a final fall yielded a highly significant result (W=1350, p«0.001); for the 
length of the final syllable the result is significant (W=520, p=0.017). 

319 This is observed in the Spanish data: penults in phrase-final and non-phrase-final paroxytones 
are significantly longer than in oxytones, according to a Wilcoxon rank-sum test (W=4206, p«0.001); 
final syllables in oxytones longer than in paroxytones (W=11628, p=0.001). 
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are slightly biased?” for bearing the H tone and realizing the peak, making them 
correspondingly even longer than other CVC syllables, while CV penults are slightly 
biased against bearing the H tone and realizing the peak, resulting in them being 
somewhat shorter on average than other CV syllables. 

Overall, the results confirm the proposal that in part of the Quechua data, an 
alignment pattern is active in which tones do not associate with a putatively stressed 
syllable, but instead seek alignment with the edge of the phrase. In particular, peak 
alignment behaviour seems markedly different from that of Spanish words with 
a superficially similar rising-falling pitch contour, where it is uncontroversially 
due to association with a stressed syllable and alignment to a landmark relative 
to that syllable. It also seems likely that peak position in Quechua is affected by 
variable temporal constraints within the last two syllables, possibly influenced by 
syllable structure. The H tone responsible for the Quechua contour likely associates 
with the TBU on which it is realized at least in a subset of the cases. That so much 
variability has been found confirms the suggestion that there are several align- 
ment patterns active in the data, each contributing to the varied overall picture. In 
section 6.3.1, I will show that the Quechua rising-falling contour under discussion is 
indeed generated from more than one of the alignment constraint rankings which 
are independently motivated there. 


6.1.7 Extent of the phonological phrase and mapping to other categories 


Before moving on to the discussion of loanwords from Spanish, I want to make a 
short excursion exploring the extent of the phonological phrase and its mapping 
to the categories of word and utterance via two singular examples. First, KP04 - 
MT Q 0393 (Figure 155), with context (126). At 36.5, KP04 gives the information 
that the path in the map leads above the tree. After a pause, TP03 answers with an 
acknowledgement token (38.5). The delay in answering seems to be interpreted by 
KP04 as indicating less than full understanding," because he repeats his previous 


320 Assuming equal distribution of syllable types in all positions, a CVC penult is equally likely (p = 
1/3) to be followed by a final syllable that is CV, CV:, or CVC. In 1/3 ofthose cases, when it is followed 
by a CV final syllable, it is even more likely to attract the peak than the penult is anyway, according 
to the results in Figure 153. For CV penults, it is the other way around. Therefore, statistically, CVC 
penults are slightly biased to attract the peak, while CV penults are relatively biased against it. 

321 See Kendrick & Torreira (2015); Bógels et al. (2015) on how delays in responding to various 
conversational actions are associated in with an increased likelihood for the response being dis- 
preferred from the perspective of the recipient, i.e. in this case, incomplete understanding or 
agreement. 
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utterance at 39.3 as a request for confirmation.*” That it is understood as such is 
clear from TP03's rapid response of three repeated acknowledgement tokens. KP04’s 
repetition, the confirmation-seeking request (Figure 155), is realized as a rise-falling 
contour with word boundary alignment. A low stretch on the first word hacha is 
followed by a high tone on the initial syllable of hananpa. The high tone is main- 
tained into nan, but then falls to reach a low target on the final syllable, from which 
it then rises again towards the end of the phrase. Clearly, there must be at least an 
additional H tone after the LHL ofthe rise-falling contour, aligned with the right edge 
of the phrase. This is the only example in the Huari Quechua data in which such an 
additional tone clearly occurs,?? an intonational hapax legomenon to my knowledge. 


(126) TP03 & KP04 MT. Q 0365 — 0409?" (context for Figure 155) 


time KP04 (with the path  TP03 (without the path 
(seconds) onthe map) on the map) 
36.5 (L H------- L) 


hacha hana-n-pa 
tree above-3-GEN 
above the tree 


37.2 (1.2) 
38.5 (H L) 
mhm ya 
39.3 (L H L H) 
hacha hana-n-pa 
tree above-3-GEN 
above the tree 
40.4 (HL ) 
ya ya ya 


That this additional tone is there is not only unexpected from the findings about 
Huari Quechua phrasal contours so far, but also from the point of view of tone 
distribution to TBUs. Above we saw that the rising-falling contour is never realized 
on bisyllabic single-word phrases, and that the *displacement" of the H tone to the 
penultimate voiced syllable of the phrase is also explained well by tonal crowding. 


322 The confirmation KP04 seeks is that TP03 has understood so that they can move on to the next 
landmark; not a confirmation that the information is correct (KP04 provides the path information 
here, not TP03). 

323 Another potential case occurs in the same Maptask later at 278.9, but contains several voice- 
less stretches and is therefore difficult to assess. 

324 https://osf.io/etny4/ 
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Yet here, at least three tones are aligned on the two syllables nan.pa. Morpholog- 
ically, pa is not long (i.e. possibly bimoraic), according to Parker (1976: 84). Aside 
from this tonal crowding problem, three explanations for the exceptional realiza- 
tion with four tones are possible here. The first is that this is the only token in the 
corpus of a rise-fall-rise contour, which is simply so rare that it's not encountered 
elsewhere. That is certainly not impossible, if this contour is associated with a par- 
ticularly marked pragmatic context condition. However, there are plenty of confir- 
mation-seeking questions occurring under comparable context conditions in the 
Quechua data, and they do not realize this contour. 


| bd l 


Figure 155: KP04 MT. Q 03937? (confirmation-seeking question consisting of a falling contour plus 
final rise). 


The second is that this is a case of adding boundary tone movement from Spanish to an 
otherwise Quechua utterance. If this were a readily available option to the speakers, 
we might expect to see it more frequently, but it is equally not impossible. The third 
explanation is that the contour normally realized on the utterance-final particle aw, 
is here realized on —pa, with the segmental material of aw fully assimilated to it. This 
is appealing because aw often occurs in confirmation-seeking and other biased ques- 
tions. Utterance-initially, it means “yes” (Parker 1976: 35), but utterance-finally it is 
used like a question tag in other languages, e.g. Spanish no or verdad. From the obser- 
vation of other examples it appears that this utterance-final aw, if realized in its own 
phrase, always realizes that phrase with a rising contour. This explanation entails that 
this utterance, consisting of a single word, is realized with two phonological phrases, 


325 https://osf.io/zmbpk/ 
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one with a rise-falling, and an additional one with a rising contour. This might seem 
outlandish, but Igarashi (2014: 466-467) makes the argument that this also occasion- 
ally occurs in Japanese, when more than one Accentual Phrase (AP) is realized on a 
single morphological word.?5 This is not tantamount to saying that the prosodic word 
is larger than the phonological phrase here, just that a phrase is not always as large or 
larger than a morphological word (or a word plus segmentally assimilated particle). 
That remarkable gaps can exist between the extent of morphological word(s) and of a 
phrase is also evident from another example, AZ23_Cuent_Q_0717 ((127)/Figure 156). 
AZ23 is telling about a speaker change between the two characters in the Cuento story. 


(127) AZ23 Cuent Q 0717 
hampi-pa nin ni-pti-n-qa runa-qa ni-n 
heal-BEN say-3 say-SUBDIFF-3-TOP person-TOP say-3 
“for healing’, he said / when this is said, the man says” 


heal BEN say-3 say SUBDIFE-3- FOP person- TOP say-3 


far Sealing. he says when this 18 said, the man says 


Figure 156: AZ23_Cuent_Q_0717*” (two declaratives, consisting of two rise-falling and one rising 
contour, respectively). 


326 For French, where the non-isomorphy between lexical words and APs is uncontroversial, 
Jun & Fougeron (2002: 159-160) report that more than the expected number (one initial, one final) 
of accentual rises can occur on a single very long word (they use anticonstitutionellement ,uncon- 
stitutionally“). They conclude that APs can exceptionally have more rises than the expected initial 
and final one, but arguably a more principled analysis might be to assume, given that non-isomor- 
phy between morphological words and APs is established in the other direction (1 AP containing 
>1 word), that here more than one AP maps onto a single word. Since the additional rises show a 
tendency to occur near morphological boundaries, such an analysis would also preserve the delim- 
itative character of the AP pitch movements. 

327 https://osf.io/z4re9/ 
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At the beginning, hampipa is part of the direct speech of one of the protagonists. It 
is phrased separately with a rise-falling contour, followed by the verb of saying in 
the third person ni-n, which is also realized with a rise-falling contour. It realizes at 
least two (HL) or possibly three (LHL) tones on an at most bimoraic syllable. Phras- 
ing here separates the content of the direct speech from the quotative verb, and 
that again from the subsequent utterance niptinqa runaqa nin, which is realized as 
a single rising phrase on three morphological words. It announces a speaker switch 
(achieved by the subordinating marker under subject change —pti, cf. Parker 1976: 
143-144), and establishes the referential frame under which the following utter- 
ance, giving the content of the other protagonist’s speech, is to be understood. The 
entire second utterance is thus contextually an introduction to the following one 
(note the topic marker -qa on both the verb niptin-qa and the noun runa-qa) and 
the proposition asserted by it is suspended and incomplete.?5. That is likely why it 
is realized with a rise and without any of its components individually phrased (cf. 
the findings in sections 6.2.2 and 6.4). Here a single phonological phrase thus real- 
izes one monosyllabic word, nin (possibly trespassing against otherwise active con- 
straints on tonal crowding), on the one hand, and an entire utterance consisting of 
three polysyllabic words, on the other. The mapping between morphological words 
and phonological phrases is thus clearly far from predetermined and certainly 
malleable in order to conform to constraints imposed by contextual meaning. The 
case represented by the first example, with more than one phrase possibly real- 
ized on one word, is less frequent than that represented by the second, where the 
non-isomorphism goes in the other direction. But given that the latter possibility 
is a fact, the former should not be dismissed out of hand. Mapping more than one 
phonological phrase to a single long morphological word might also go some way 
in explaining the very heterogeneously reported secondary *stresses" in Quechua 
varieties (e.g. Parker 1976: 57-60; Adelaar 1977: 43-46; Stewart 1984; Hintz 2006; 
O'Rourke & Swanson 2013; cf. also the overview in Hintz 2006: 482-483). In the fol- 
lowing, we will not fully resolve the questions posed by these two examples. I will 
at times return to the discussion of tonal crowding and whether it can be explained 
by assuming a moraic structure. In the next section, we will move on to marked 
intonational patterns that are restricted to loanwords from Spanish. 


328 Weber (1989: 15) takes the verb of saying ni- to take sentential complements. 
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6.1.8 Marked tonal alignment on Spanish loanwords 


This section presents two exceptional intonational patterns. They only occur on 
lexical items that are etymologically of Spanish origin,*” and they crucially deviate 
from the characterizations given for the previous tonal alignment patterns. The 
frequency of their usage also varies considerably between the speakers in our 
corpora. We have already encountered several examples containting words of 
Spanish origin, where they do not behave any differently from words of Quechua 
origin. In contrast, the first of the loanword patterns covered in this section is char- 
acterized by the tonal transition, i.e. the rise or the fall, not occuring in one of the 
positions already established (phrase boundary, word boundary, or word penult), 
but instead as determined by the location of the stressed syllable in the correspond- 
ing Spanish word. That is to say, while the alignment pattern with the word penult 
makes reference to a fully regularly determined prominent position, this pattern 
requires a lexically specified word stress position, but otherwise exhibits the same 
contours. In the second loanword pattern discussed here, the tonal transition 
familiar from the Quechua contours occurs where expected from one of the three 
established patterns, but in addition, a separate local pitch event takes place at the 
location of the stressed syllable in the Spanish word. I will call the first of these 
the “inherited” loanword pattern, because it seems somewhat exotic when viewed 
from the perspective of the rest of the prosodic system, formed by the demands of 
a quite different system (paying much more attention to a structure like stress), 
like an oddity inherited from a traveling grandparent. The second pattern with the 
additional pitch event I call the “grafted” pattern, because the lonely localized pitch 
event appears not quite to fit in with the rest of the intonation, like a branch from 
a eucalyptus grafted on to a fir. Spanish-origin words in a Quechua utterance do 
not obligatorily require one of the marked loanword prosodic patterns. Using these 
patterns seems at best one of several options available to the Huari speakers. In 
the last part of this section (6.1.8.3), the distribution of the intonational patterns 
for Spanish-origin words in some of the corpora will be investigated. In the next 
section I will first give a more detailed description. 


6.1.8.1 Description of the patterns 
In AZ23's MT Q 1158 ((128)/Figure 157), abejakunapa is realized on the one hand 
with a rising contour finally, with pitch reaching a high plateau in the penult [na] 


329 I use “Spanish loanword" as a shorthand to refer to this group of words, but this section will 
show that their behaviour is far from homogenous. 
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and extending to the final syllable [pa].?? Yet the syllable [be], stressed in the 
Spanish abeja, realizes a substantial first rise, followed by a gentle fall. Note that it 
is also realized with more duration and energy than most other open syllables in 
this utterance. This is an instance of the mnemonically labelled *grafted" pattern, 
where a local pitch event takes place on the location of the Spanish stressed syllable 
in addition to the “normal” tonal make-up of the phrase. 


(128) AZ23 MT Q 1158 
tsay-pita abeja" -kuna-pa hana-n-pa escoba-man _ chaa-nki 
DEM.DIST-ABL bee-PL-GEN above-3-GEN broom-DEST arrive-2 
“from there along above the bees you get to the broom” 


331 


tsay -prta abcha-kzu-pa haaa-n-pe cscoba-nan chaa-nki 


DEM DIST-ABL beo PL-GEN above-3-GEN broven GEN amive? 


froen there above the bees, vou pet 1 the broon 


Figure 157: AZ23 MT Q 1158?" (declarative consisting of three rising and two falling contours). 


330 I analyse the contour on abejakunapa as rising because I assume that the incipient fall on 
the final [pa] is due to the upcoming initial L of the phrase realized on hananpa and not because 
of a final L of the phrase realized on abejakunapa, which would make it a falling contour. This is 
because pitch only reaches the low elbow in [ha] instead of [pa]. However, the transition between 
hananpa and escobaman shows that supposedly the same underlying tones can also result in a 
much less ambiguous phonetic realization. To a certain extent, such analyses will always leave 
some space for interpretation and must also take recourse to external considerations like the hy- 
pothesis that rising contours signal continuation and falling ones finality (cf. section 6.2.2), making 
this here more likely to be a rising contour. The circularity in this argumentation can probably not 
be entirely avoided. 

331 Lexical items or stems originating historically from Spanish in the examples in this section 
are italicized. 

332 https://osf.io/9u4s6/ 
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Compare this on the one hand with escobaman in the same utterance, which does 
not seem to pay much attention to the Spanish stress position at all, as far as can be 
made out, and on the other with the realizations of terceruchaw in OZ14 Conc Q 0331 
(see (129)/Figure 158) or primeruchaw in QZ13 Conc Q 1516 ((130)/Figure 159): both 
of these are produced with a rising contour, with the rise taking place not on the 
phrase penult or the initial syllable, but at the location of the stress in the Spanish 
word, i.e. the antepenultimate, [se] and [me] respectively. 


(129) 0714 Conc Q 0331 
segundu fila-chaw terceru-chaw  chugllu 
second  row-LOC third-LOC corncob 
“in the second row, the third one is the corncob" 


Figure 158: OZ14 Conc, Q 03317? (declarative with three rising and one falling contours). 


(130) QZ13 Conc Q 1516 
tercera fila primeru-chaw riku runa 
third row first-LOC rich person 
“in the first one the third row [is] the rich person” 


333 https://osf.io/e2b4y/ 


356 — 6 Huari Quechua 


Figure 159: Q713 Conc Q 1516?" (declarative with two rising and one falling contours). 


These are instances ofthe "inherited" pattern, where the Spanish stress determines 
the location of the tonal transition in the phrase contour, but does not produce an 
additional tonal event. The “inherited” pattern isn't always identifiable unambig- 
uously: in both utterances, the first word, segundu and tercera, respectively, looks 
like it might also follow this pattern, with elevated pitch and more duration on the 
Spanish stressed syllable, followed by a fall. However, this might also be an instance 
of the rise-falling contour with one of the previously seen alignment patterns.?? 
Similar considerations pertain to many bi- and trisyllabic words of Spanish origin 
in the data. It is noteworthy that only very few cases were found where the *inher- 
ited" pattern is unambiguously realized in a falling contour. This might suggest that 
the rise on Spanish accented syllables is part of what is emulated in the inherited 
pattern, not just the abstract property of accentedness. 

The following two utterances, OZ14 Conc Q 0013 and XU31 MT Q 2837 ((131)/ 
Figure 160 and (132)/Figure 161, respectively), are further examples for the forms 
words with an “inherited” or “grafted” accent can take. 


(131) 0714 Conc Q 0013 
segundu fila-chaw  primeru-chaw 
second row-LOC first-LOC 
“in the second row, in the first one” 


334 https://osf.io/3ejw9/ 

335 It is an open question whether in a bilingual community such superficially similar forms are 
distinguishable at the level of phonological representation in all cases. Their convergence is dis- 
cussed in section 7.4. 
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segunda fila-chaw permeru-chaw 


second row- LOC fína OC 


the fest oe in the second row 


Figure 160: 0714 Conc Q 001379 (part of a declarative consisting of three phrases). 


(132) XU31 MT Q 2837 


manka ladu-n-chaw  atuq tsay frenti-n-chaw-raa-mi 
pot side-3-LOC fox DEM.DIST front-3-LOC-CONT-ASS 
ka-ykaa-n murcielagu 


COP-PROG-3 bat 
*next to the pot, still before the fox, there is the bat" 


Fog) 


pol suhe-3 LOC fox DEM DIST front-3-LOCACONT-ASS COP-PROG-3 bat 


manka lade-n-chaw atig Isay frenti-n«chaw-taaeemi ka-yksa-n murcielagy 


| ext to the pot, still before the fox, is the bet 


Figure 161: XU31 MT. Q 2837? (declarative with 4 rising and one falling contours). 


336 https://osf.io/judpy/ 
337 https://osf.io/qbuk3/ 
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In OZ14 Conc Q 0013 (Figure 160), the pitch movement on filachaw takes place 
almost completely on the first syllable that would be stressed in the Spanish fila, 
while in primeruchaw, the rise on the Spanish stressed syllable [me] is less steep, 
but a further phrase-final rise can be identified much more unambiguously. In 
XU31, MT Q 2837 (Figure 161), different lexical items, even though all of Spanish 
origin, can be seen to display a different prosodic behaviour within the same utter- 
ance: on manka ladunchaw, the rise ignores the Spanish stressed position [la] and 
realizes the rise instead on the phrase penult, [Óun]. On tsay frentinchawraami, 
instead, the Spanish stressed syllable [pren] is elevated, and pitch falls after it 
again to then produce a second rise-fall pertaining to the phrase-final tones, in an 
instance of the *grafted" pattern. On murcielago, again, it is the phrase or word 
penult that attracts the rise, while the Spanish stressed position [sje] is ignored. In 
section 6.1.8.3 we will see that forms derived from Spanish lado are almost never 
realized with an “inherited” or “grafted” pattern, while for murcielago, there is 
much more variation in the speech of speaker XU31. This indicates that the use 
of the marked loanword patterns is both lexicalized and speaker-dependent. In 
most of the words seen here, the Spanish stressed syllable, if it is open, is produced 
longer relative to other open syllables in its surroundings, no matter whether it is 
realized with the marked loanword patterns or as part ofa phrase realized with one 
of the previously discussed patterns. This stands in contrast to what was observed 
about the penult in Quechua-origin words, which is not usually longer independ- 
ent of peak position, but is coherent with similar duration behaviour in the Huari 
Spanish data (cf. section 6.1.6). This observation is relevant because it might be 
related to the following analytical challenge: in rising contours on bisyllabic words 
of Quechua origin (cf. Figure 141), the first syllable was seen to be usually almost 
fully low and only the second high, with the transition between them taking place 
“between” the syllables, especially when voiceless segments intervene between the 
two nuclei. This has been assumed to be due to constraints against tonal crowding 
overruling any which align the H tone with the penult (but cf. section 6.1.7). In 
several phrase-initial bisyllabic words of Spanish origin, on the other hand, a sub- 
stantial part of the rise takes place clearly already on the first syllable, suggesting 
that this might be a *grafted" pitch accent and that its faithful realization is unim- 
peded even by the constraint against tonal crowding. One example for this was the 
contour realized on riku in (130)/Figure 159. Further ones occur on nubi in OZ14 - 
MT Q 1964 ((133)/Figure 162) and bolsan in XU31 MT Q 1081 ((134)/Figure 163). 


(133) OZ14 MT Q 1964 
nubi  trenu-kuna-wan — ka-ykaa-n 
cloud thunder-PL-INST COP-PROG-3 
“there is a cloud with thunder" 
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there ts a cloud with thunder. 


Figure 162: OZ14 MT Q 1964?* (declarative with a falling and two rising contours). 


(134) XU31 MT Q 1081 
bolsa-n  apta-ku-shqa 
bag-3 ` grab-MED-PRTCP 
*grabbing a bag" 
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Dalsa-n agta-kir-shqa 


beg: 3 grab-MID-PRICP 


grabbing a bag 


Figure 163: XU31 MT Q 1081? (part of declarative consisting of two rising contours). 


338 https://osf.io/rcqxa/ 
339 https://osf.io/fr2py/ 
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Nubi in Figure 162, in addition to the pronounced rise fully taking place on the 
initial syllable, also lengthens that syllable considerably, and the vowel is in fact 
realized as a diphthong [ou]. In Figure 163, the vowel in the first syllable of bolsan 
is not lengthened like that and the syllable is not actually longer than the others. 
This is possibly because a CV:C syllable would go against Quechua syllable struc- 
ture, with CV: or CVC being maximal, and potential CV:C syllables usually reduced 
to CVC (Parker 1976: 56). However, the syllable here is preceded by a substantial 
period of nasalized phonation (segmented separately in the syllable tier in Figure 
163 and transcribed with <m>), resulting in the auditive impression that the initial 
voiced obstruent is strongly prenasalized, yielding [mbolsan]. Such phonetic prena- 
salization, called *nasal leak", occurs frequently in voiced obstruents in phrase-in- 
itial position in languages where voiced obstruents are characterized by having 
negative voice onset time. It is one possible solution to the articulatory problem 
of having to build up subglottal pressure to maintain voicing long enough for it to 
be recognizable while at the same time being forced to release the labial closure 
due to increasing supraglottal pressure, the other being devoicing of initial voiced 
obstruents (cf. Solé 2012, 2014). It might be considerable that here, this articulatory 
phenomenon is actually exploited to enable the realization encountered in Figure 
163, where the nasal segment serves as carrier for the initial low tone, like an aux- 
iliary TBU, so that the high tone can be realized on the syllable bearing the accent 
in Spanish and the constraint against crowding is not violated. The nasal would 
then be epenthetic and tone-bearing. More controlled studies will have to be done 
to confirm or disconfirm such a hypothesis, but assuming the account is adequate, 
it points towards moras, not syllables being TBUs here. The stressed syllables in 
Spanish loanwords could then be bimoraic (cf. Buchholz & Reich 2017 for a similar 
account of a rhythmic phenomenon in Huari Spanish), and this would be reflected 
in their preferential realization as lengthened, diphthongized, or, where that isn't 
possible, with the insertion of an epenthetic sonorant segment like the prenasal. 
The prohibition of tonal crowding should then refer to the mora as TBU, and we 
should expect to see similar effects in Quechua-origin words with phonologically 
long vowels. However, this prediction does not always hold: e.g. in Figure 136, the 
word rikaa, with a phonologically and phonetically long final vowel, realizes the 
falling contour with the penult high, and all of the final syllable low. On the other 
hand, in Figure 129, the rise on uusha is realized on the first syllable containing 
the long vowel, parallel to the examples with Spanish loanwords above, although 
that syllable is not actually longer than others in the same utterance. Recall also 
from section 6.1.6 that in some of the utterances with falling contours, the longer 
the final syllable relative to the penult, the further to the right the peak is realized. 
Taken together, this is at least some evidence for tonal placement being sensitive to 
the length of the sonorant segments involved. Note that the evidence also suggests 
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that phonological or morphological length by itself often seems not enough for a 
syllable to exhibit behaviour indicative of it containing two TBUs. It must instead 
be actually phonetically long relative to other syllables, or diphthongized, or other- 
wise enhanced. It is quite likely that the observed variability is again indicative of 
several underlying patterns. Further research is needed to come to a more defini- 
tive conclusion, especially since the tonal crowding account with syllables as TBUs 
has proven very useful elsewhere in the analysis. 


6.1.8.2 A typological comparison 

Loanword prosody is still a somewhat underresearched branch of loanword phonol- 
ogy. Nevertheless, several studies have compared various cases and produced initial 
typologies (e.g. Kubozono 2006; Kang 2010; Davis et al. 2012). Davis et al. (2012) doc- 
ument case studies varying both source and target language along the traditional 
classification of “stress”, “tone”, and “pitch accent" language and present an initial 
taxonomy sorting them along three binary parametres: 1), whether features of the 
source language play a role for the assignment of loanword prosody in the target 
language ([*/-SL] in their shorthand); 2), whether loanword prosody makes use of 
rules or constraints specific to loanwords and not occuring elsewhere in the target 
language ([*/-sp-loan]); and 3), whether the prosodic pattern in the adapted words 
is determined by segmental or suprasegmental features ([+/-pros])** (cf. Davis et al. 
2012: 15). They also note further differences in loanword adaptation strategies: loan- 
words may recreate prosodic properties of the source word using the means of the 


340 At first sight it might look as if there must be overlap between the parametres, especially 
between 1) and 3); however, they need not be dependent on each other: loanwords in Japanese are 
nearly universally accented following a rule first formulated by McCawley (1968): accent falls on 
the antepenultimate mora. According to Kubozono (2006), the same rule, reformulated in terms of 
atrochaic moraic foot and final syllable extrametricality (and thus of course effectively identical to 
the Latin Stress Rule, cf. Hayes 1995) is also valid for the majority of accented Japanese words and 
compounds. Because loanwords are assigned accent uniformly according to this rule, independent 
of source language and where stress might lie in the source language, Davis et al. (2012: 25-26) 
class loanword prosody in Japanese as being independent from source language prosody with re- 
gard to parametre 1), and because the rule assigning accent is quantity-sensitive, they class it as 
being determined by suprasegmental rather than segmental features with regard to parametre 3). 
An objection might be that if accent placement is determined by quantity, a feature of the source 
language, namely the syllable structure of the loanword, does play a role. However, this turns out 
to not be the case: taking sandwich > san.do.itt.chi (uu.u.wu.p) as an illustration (cf. Kubozono 2006: 
1141), because Japanese has such a restricted syllable structure, source language syllable structure 
is completely broken up, and only this segmentally refitted version of the loanword then undergoes 
accentuation, counting syllables not existing at all in the source language (the final syllable here) 
in the assignment process. 
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target language on a syllable-by-syllable basis (as happens with English loanwords 
in Hong Kong Cantonese, cf. Davis et al. 2012: 20-21), or they may recreate an overall 
tonal melody for words that is frequent in the source language but absent in the 
target language and generalize it to all loanwords from that source language, thus 
leveling differences between accent positions present in the source language (Japa- 
nese loanwords in Taiwanese Southern Min, cf. Davis et al. 2012: 22-23). They also 
state that a language may base its loanword prosody on phonetic properties of the 
source words, as in the case of English loanwords in Lhasa Tibetan: English words 
beginning with a voiced obstruents are realized with an L tone on the first syllable 
of the loanword (bottle > po to ra [L.H.H]), but words beginning with a voiceless 
obstruent receive an H tone on the first syllable (police > pu li su [H.H.H]; the initial 
syllable of the word being the only place where there is a possible specification of 
tone being either H or L, all other syllables always being H), ignoring the stress posi- 
tion in the source word and also leveling the voicing distinction segmentally, which 
does not exist in Tibetan. Their interpretation is that this is due to the well-known 
articulatory property of voiceless obstruents of raising pitch in the following vowel 
(cf. Hombert et al. 1979; Hanson 2009; Kirby & Ladd 2016; Ladd & Schmid 2018), so 
that the initial high tone preserves as much of the phonetic properties of the source 
word as possible given the system of the target language (cf. Davis et al. 2012: 19-20). 

Applying this taxonomy to the Spanish loanwords in Quechua, with regards to 
1), both the “inherited” and the “grafted” pattern clearly pay attention to prosodic 
features of the source language, namely the acoustic prominence of the Spanish 
stressed syllable as realized by tonal movement and duration. They treat it differ- 
ently though: in the “inherited” pattern, what happens on the Spanish stressed syl- 
lable is somehow equated to the rise taking place in the rising contour, with the rise 
then produced on a syllable where it would not normally appear In the “grafted” 
pattern, what happens on the Spanish stressed syllable is treated as something sep- 
arate from the phrasal intonation of Quechua, but it is still emulated prosodically, 
as an additional pitch event localized to that syllable. In cases where the Spanish 
loanword is not realized with either the *inherited" or *grafted" pattern but treated 
like any other word, the Spanish stressed syllable is at least still produced with 
longer duration if it is open. Thus, the source language prosody always plays a role, 
but it can surface in two different ways tonally or just via duration ([+SL]). With 
regards to 2), the “inherited” and “grafted” patterns clearly constitute prosody spe- 
cific to loanwords, since both require an irregular lexical specification of a prom- 
inent position. This specification is lexicalized, since it can cause the pitch event 
on the Spanish stressed syllable to appear on syllable positions in Quechua words 
that are outside the Spanish three-syllable window. The *grafted" pattern addition- 
ally has a tonal specification that is unique to these words. In those cases where 
words of Spanish origin are not realized with the “inherited” or “grafted” pattern, 
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no loanword-specific prosody is employed, because long vowels are a regular 
feature in Quechua. However, the option for a loanword-specific prosody exists 
and is frequently employed, so overall we may classify our case as [* sp-loan]. With 
regards to 3), itis clear that it is suprasegmental and not segmental features which 
determine the resulting prosody of these loanwords, thus [+pros]. The property of 
accentedness itself may be taken as that which effects the special prosody of these 
loanword, since the speakers are all bilingual, we rarely find any *wrong" stress 
placement when they speak Spanish, and they can therefore be safely assumed to 
have robust knowledge about stress positions in their Spanish lexicon. Yet the fact 
that the “inherited” pattern seems to come up mostly in rising contours, and that 
pitch movement and duration are not equally distributed in the production of these 
words as loanwords in Quechua seems to indicate that the speakers take the pho- 
netic correlate of this accentedness not to be an inevitable property of the lexical 
word form - which is also reflected in their usage when speaking Spanish. 

In sum, it is certainly possible to apply Davis et al's (2012) taxonomy to our 
case. However, in the data here, several realizational varieties for these loanwords 
can be found, some of which would require variant classifications according to the 
taxonomy. 

Theloanword intonational patterns can be characterized in relation to the ones 
seen before. The tonal alignment patterns make reference to different prosodic land- 
marks, to the phrase-peripheral domain boundaries, to a phrase-internal boundary 
ofthe next lower domain, the prosodic word, and to the word penult of the strongest 
word in the phrase, a regularly determined prominent position independent of the 
lexical identity and morphological makeup of the words involved. To this, only in 
words of Spanish origin, the loanwords patterns add a lexicalized and thus not fully 
regular prominent position, the stress position *inherited" from the Spanish word 
(e.g. (135), with the Spanish stressed syllable marked with an asterisk). 


(135) “Inherited” accent on a Spanish-origin word (from OZ14 Conc Q 0331, cf. 
(129)/Figure 158) 
[ter se* rə tfUlppp 


The “grafted” pattern even produces a pitch accent at this position that is addi- 
tional to the phrase tones. In other words, while the “inherited” pattern affects 
the alignment of the phonological phrase tones of Quechua in providing a unique 
lexically specified position the H tone is attracted to, the “grafted” version causes 
pitch accent tones to be additionally assigned for that position. In this sense, the 
“grafted” pattern is in theory not an alternative to any of the patterns described 


364 —— 6 Huari Quechua 


before, but an addition, as it can possibly occur together with any of them. The addi- 
tional pitch accent of the “grafted” pattern can here be analyzed as LH* in ToBI-no- 
tation, because it seems to be characterized by a rise with a peak on the accented 
syllable. This is plausible in light of the findings from chapter 5 on Spanish, where 
this pitch accent was found to be the only regular pitch accent on stressed syllables 
in a variety of contexts. Because many of the words with a “grafted” accent (e.g. 
abejakunapa in Figure 157 or frentinchawraami in Figure 161) clearly exhibit a fall 
or return to low after this LH* accent on the Spanish stressed syllable, which then 
extends until the rise pertaining to the end of the phrase, another L tone should be 
assumed there. The easiest way to account for this is by assuming that alignment 
rankings for the LH* tones are as in the Spanish *main" variant (cf. section 5.3.2.1), 
and that their alignment ranking dominates that of all the Quechua phrasal tones, 
so that they are all realized sequentially following the pitch accent tones. Then, the 
dip following the LH* peak is due to the presence of the L tone that is otherwise 
initial in Quechua phrases. For this, the LH* tones must be specified as different 
from the phrasal tones so that different constraints can apply to them. Since the 
tones in both Spanish and Quechua are assigned at the level of the PhP, the differ- 
ential specification might have to make reference to the two different languages. 
That would in fact capture the notion that these accents are somehow intrusive.?*! 


(136) “Grafted” pattern on a Spanish-origin word (from XU31 MT Q 2837, cf. 
(132)/Figure 161) with LH* pitch accent and two different sets of tones 
[tsej,, pren*tin tju 1a milp 


Ls H; Ly—>H, L, 


Using biological metaphors like “inherited” and “grafted” evokes images of cross- 
breeding, and of the admixture of elements from different pedigrees that do not 
really fit well together because they belong to different *systems". At first glance, this 
seems apt in the situation at hand, where two “language systems" are so obviously in 
contact with each other and where we as linguists can so easily discern the “foreign” 
provenance of loanwords in the speech of these speakers because we are well-ac- 
quainted with the historiography of the two languages in question that treats them 
as separate for the majority of their documented and projected existence. However, 


341 Considering that intonational contours can be lexicalized on individual lexical items when oc- 
curring sufficiently often under similar context conditions (Calhoun & Schweitzer 2012; Schweitzer 
2012; Schweitzer et al. 2015), another possibility might be that the Spanish pitch accent tones have 
become part of the lexicalized representation. Then the alignment could refer to these “lexicalized” 
tones vs those assigned by the PhP. 
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itis worthwhile to switch perspectives here and to consider the phenomenon from a 
typological viewpoint. From the view oflanguages like (Northern varieties of) Basque 
(Elordieta 1998; Hualde 1999; Hualde et al. 2002), Tokyo Japanese (Pierrehumbert 
& Beckman 1988; Kubozono 2008; Kawahara 2015), or Turkish (Levi 2005; Kamali 
2011), having a subset of the lexicon, so-called accented words, with prosodic specifi- 
cations in addition to the ones valid for all other, unaccented, words or phrases, is par 
for the course. To demonstrate: in (Tokyo) Japanese, accent manifests as a H tone on 
the accented mora followed by an L tone on the following one, i.e. a fall that has been 
analyzed as H*L in Japanese versions of ToBI (J-ToBI, cf. Venditti 2005). AP-initially?? 
there is a rise with an L tone on the initial and a H tone on the following mora. If 
the lexical accent is initial, it supersedes the AP-initial rise (cf. (137)a). A mora left 
unspecified for tone by either of these mechanisms receives the tone from the right- 
most specified mora (variously analyzed as tonal spreading, copying or interpola- 
tion, cf. Kawahara 2015: 449—450 and the works cited there). These mechanisms fully 
determine the tonal shape of all Japanese words at the word- and AP-level (cf. for 
trisyllabic words + an unaccented suffix in one AP?? (137)a-d). Accented words need 
an additional lexical specification (location of the accent) that unaccented words are 
lacking, but behave the same otherwise. Accent position is almost exclusively cued 
by pitch; duration is not a good cue for it and also signals lexical contrasts non-culmi- 
natively (cf. Beckman 1986; Kubozono 2015 amongst others). 


(137) Tokyo Japanese (adapted from Kawahara 2015: 448—452) 
a. inochi-ga (city-NOM) (initial accent) 
i no chi -ga 


sees 


| 
H L 


342 In all the works on Japanese, Basque and Turkish cited in this section, an Accentual Phrase 
(AP) or Phonological Phrase (PhP) is assumed to be the minimal prosodic unit at which tones are 
assigned and where accent is culminative. This manifests e.g. in the analysis for Japanese in that 
only one accented word can occur in an AP, optionally with additional unaccented words. It should 
be kept in mind that the phrase-initial (and/or phrase-final, in Turkish and Quechua) tones prop- 
erly belong to the phrase, and not to the word, as the pitch accent does. So in an AP with two 
unaccented words, like yamada-ga uta-tte-ru (Yamada-NOM sing-PROG-COP “Yamada is singing”), 
there is only one phrase-initial rise on the first word, and the second word is realized just as a 
high plateau, without further pitch events (cf. Venditti et al. 2008: 459). Adjusting for their specific 
prosody, this holds also for Basque and Turkish. Since APs frequently map to a single word, the 
regular phrase-final or —initial tones have often been interpreted as a regular lexical accent, e.g. in 
traditional descriptions of Turkish (cf. Féry 2017). 

343 Higher-level prosodic units can add further tones leading to interactions with those given in 
an actual utterance. 
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b. kokóro-ga (heart-NOM) (penultimate accent) 
ko ko ro -ga 
NN 
L, H L 
c. atamá-ga (head-NOM) (final accent) 
a ta ma-ga 
lycos T 
L, H H L 
d. miyako-ga (city-NOM) (unaccented) 
mi ya ko -ga 


[. [aee 


Lo H, 


Northern Biscayan Basque has a surprisingly similar system:?^ words are divided 
into accented and unaccented ones, the accent always takes the form of a fall after 
the accented syllable (H*L) and is free to occur on any syllable in the word except 
for the final one in most cases, there is a word-initial rise, and tones spread across 
unspecified syllables (leftwards from the accented syllable, according to Hualde 
1999: 950). In a position that is not directly preverbal (the focal position according 
to Hualde et al. 2002, in which unaccented words also receive a falling H*L accent 
in final position), accented words boast both a surface pitch event and a word- 
level tonal specification (the position of the accent) in addition to the ones found on 
unaccented ones (cf. (138)a-c; as in Japanese, higher-level tones will interact with 
the AP-level ones). Duration is also not a good cue for accent position, according to 
Hualde & Beristain (2018), although it is not used to mark lexical contrasts. 


(138) Northern Biscayan Basque (adapted from Hualde 1999; Hualde et al. 2002) 
a. gixon-an-ari (man-GEN.SG-DAT.SG „to the one of the man“) (unaccented) 
gi xo na na ri 


| Ve 


L, H, 


344 Basque prosody is described as varying enormously despite the geographically limited area 
where it is spoken. Hualde et al. (2002: 548-549) profess a belief that prosodic differences can be 
delineated and spatially contained by establishing isoglosses, but also hint gently at the insight that 
the dialectological enterprise of pinning down the prosodic features of Basque both geographically 
and typologically must needs make use of idealization. 
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b. gixon-án-ari (man-GEN.SG-DAT.PL „to the ones of the man“) 
(antepenultimate accent^^) 
gi Xo na na ri 
bU b 54 
Ly H, HL 

c. gixón-an-ari (man-GEN.PL-DAT.SG „to the one of the men”) 
(preantepenultimate accent) 
gi xo na na ri 


| | 
LH L 


According to Levi (2005); Kamali (2011), Turkish also shares the distinction between 
accented and unaccented words and the accent in accented words is a fall with an 
H tone on the accented syllable followed by an L tone (analyzable as H*L). AP-ini- 
tially it does not have a rise but an L tone, and APs in prenuclear position*** have 
a final H tone. This has traditionally been taken to be a regular oxytone accent on 
those words that are classed as unaccented in Levi (2005) and Kamali (2011), cf. 
Góksel & Kerslake 2005). Thus in prenuclear position, accented words also require 
an additional lexical specification in comparison to unaccented ones and add 
a tonal specification and an additional pitch event to the phrase in comparison 
to a phrase made up only of unaccented words (cf. (139)a-d). In nuclear position, 
APs are realized with an initial and final low tone. This manifests as a low plateau 
followed by a final fall in APs with an unaccented word and accented words are 
differentiated from unaccented ones in that they realize this fall earlier, after the 
accented syllable which is sometimes preceded by a compressed rise (cf. Kamali 
2011: 74—77). Duration is not a reliable cue for accent position and is also not lexi- 
cally contrastive (cf. Levi 2005). 


345 This variety of Basque has “preaccenting” suffixes (Hualde 1999): suffixes that cause the preced- 
ing syllable to be accented. Preaccenting often distinguishes plural suffixes from segmentally iden- 
tical singular ones. 

346 The literature on Turkish splits IPs up into a prenuclear, nuclear, and postnuclear part, based 
on an interplay of syntactic, prosodic and information structural criteria (cf. Féry 2017: 251-253). 
Here it is only relevant that the right edge tone of APs differs with position: in prenuclear position, 
it is Hg, in (post)nuclear position it is Ly. 
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(139) a. Turkish (adapted from Góksel & Kerslake 2005; Kamali 2011) 
okulda-ymis-lar (school-EVID-PL, “apparently they are/were at school”, 
preantepenultimate accent on an AP in prenuclear position**’) 

o kul da y mis lar 
| | | | 
Lo H L H, 
b. limónlu-ya (Limonlu-DAT “to Limonlu”, antepenultimate accent on an 
AP in prenuclear position) 
li mon lu ya 
bue ace 
L, H L H 
c. bunal-an-lar-1(get.overheated-REL-PL-ACC “those that get overheated”, 
no accent on an AP in prenuclear position) 
bu na lan lan 
| | 
Le H, 
d. yðndlendir-íyor (forward-IMPERF “s/he forwards”, penultimate accent 
on an AP in nuclear?“ position) 
yón len dir i yor 


| | | 
Le H L 


These languages as well as others?” make a regular distinction between unac- 
cented and accented words, with accented words a minority in the lexicon. The 


347 Turkish is also usually analyzed as having a class of preaccenting suffixes. The evidential cop- 
ular suffix -(y)mIs (capitalized vowels indicate a vowel subject to harmony) is such a suffix. The 
imperfective -(f)yor causes the accent to fall on its first syllable, as indicated by the orthographic 
accent. In a chain of suffixes, the leftmost preaccenting one determines accent position (cf. Gök- 
sel & Kerslake 2005: 29-33). 

348 I have not found any information on what happens when a word in prenuclear position is 
accented on the penult: the L of the H*L pitch accent and the phrasal H, would then possibly 
compete to occupy the only remaining rightmost TBU. Because of this lack of information, use a 
penult-accented word in nuclear position as example here. In that position, the right-edge phrase 
tone of the AP is Lg, and no such potential conflict ensues. 

349 For example, Swedish and some of the Germanic varieties in the South Low Franconian and 
Central Franconian tone accent area, following the privative analyses by Riad (1998) for Swed- 
ish, Gussenhoven (2000a, 2000b) for Roermund and Gussenhoven & van der Vliet (1999) for Venlo 
Dutch, Peters (2008) for Hasselt Dutch. Cf. also Féry 2017: 194-200 for a discussion. Modern He- 
brew also differentiates three noun classes according to stress position: words with final stress 
that always shifts to the final syllable when suffixes are added, words with penultimate stress that 
shifts in this way, and words with a lexically determined stress position that does not shift under 
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difference in type frequency between the groups is also reflected in terms of struc- 
tural markedness: *accented" words are more marked in that they require further 
specifications than *unaccented" ones. That is also true in Huari Quechua, when 
comparing the loanword patterns with the others: the “inherited” and “grafted” 
patterns require a further tonal specification at the word level (a lexicalized accent 
position), and in the “grafted” pattern this also translates to additional tones and an 
additional pitch event. In this light, despite knowing about the historical origins of 
the “Spanish” words, we should not be averse to considering this particular varia- 
tion as simply one prosodic option (or two closely related options) available for a 
subset of the lexical items the speakers have at their disposal. As the discussion on 
loanword prosody has shown, despite variation in their realization it is clear that 
the loanwords are not produced just as if the speakers were speaking Spanish. Pho- 
nological phrases that contain them form a particular prosodic subtype in Quechua 
sharing attributes both with phrases with only “native” words and with Spanish 
stressed words. It makes sense to call these words an “accented” class in Quechua. 
The distinction between accented and unaccented words also seems to inter- 
act in a non-trivial way with loanword phonology in other languages: according to 
Kubozono (2008: 167), only 29% of native Japanese words are accented, but 93% of 
loanwords?*? are. In a similar vein, Davis et al. (2012: 14) attest that the marked or 
accented class of Modern Hebrew words, which is smallest among native words 
but to which most loanwords belong, is a modern innovation. They suspect, based 
on Zuckermann (2003), that it might have come into existence through the large 
share of vocabulary from Germanic and Slavic languages the modern language has 
adopted. In Turkish, unaccented words form the majority of native word stems, 
but most loanwords are accented (Góksel & Kerslake 2005: 26-27). In the Basque 
case, Hualde (1993, 1999: 984) even argues that the present Western Basque accen- 
tuation system (including Northern Biscayan) arose from a historical development 
in which a system that originally had no word accent changed to the present dis- 
tinction between accented and unaccented words due to the large-scale adoption 
of accented loanwords from Latin and Romance languages. If that is correct, the 


suffixation. The last type is seen as marked, because it requires further constraints in addition to 
those acting on the first two classes (cf. Bat-El 2005; Davis et al. 2012). Becker (2003: 45-46) calls 
the first two classes ‘unaccented’ and the third ‘accented’. Loanwords from languages with lexical 
stress such as Germanic or Slavic languages or Arabic usually preserve their stress position and are 
of the last class, which has by far the fewest members among native words and might even have 
only been formed under the influence of those languages (Davis et al. 2012: 17; Zuckermann 2003). 
Hebrew is not usually analysed as a “pitch accent” language. 

350 From languages other than various historical stages and varieties of Chinese, the so-called 
Sino-Japanese words, that form a large and integral part of the modern Japanese lexicon. In them, 
the share of accented words is 49% (Kubozono (2008: 167)). 
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beginning stages of this process could probably be imagined as a scenario quite 
similar to the Quechua situation described here. Basque also shows a remarkable 
prosodic variety across the different regions where it is spoken, despite occupying a 
relatively small geographical area. Besides the pitch accenting system described for 
Northern Biscayan, there are also varieties without a distinction between accented 
and unaccented words, some with a fixed penultimate accent position and some 
where there is more than one accent shape (a rising one besides the falling one, 
cf. Hualde & Beristain 2018). This is also not wholly unfamiliar from our ongoing 
discussion on Quechua, except that here we find a similar breadth of variation in a 
corpus made up only ofthe speech of speakers hailing from and residing in basically 
the same locality, the town of Huari and its environs. This suggests that the space 
of possibilities in both cases has similar dimensions. Other interesting parallels to 
the three languages here briefly discussed are that in Huari Quechua as well as in 
Japanese, Basque and Turkish (to a lesser degree), tones tend to copy (or spread) 
across tonally unspecified syllables, leading to plateaux and low stretches, and that 
duration is not a good cue for marking accent position (apart from in the words of 
Spanish origin in our data). This set of languages is however also separated by some 
properties (cf. also Ito 2002 for a prosodic comparison of Basque and Japanese): in 
Huari Quechua and Japanese, duration is employed in lexical contrasts, but not in 
Basque and Turkish. In Quechua and Turkish, phrase-initial L tones occur, while 
in Japanese and Basque there are initial rises (LH;). While Japanese, Basque and 
Turkish all have a lexical accent that has been analyzed as H*L, the pitch accent 
identified in the *grafted" pattern is LH*, like the most frequent pitch accent in 
Huari Spanish. The Quechua “grafted” pattern is similar to phrases with accented 
words in Turkish in that it forms a pitch accent in addition to the tone(s) at the right 
edge of the phonological phrase, while the “inherited” pattern is more similar to 
the Basque case, where it is the same phrase tones that appear at the right edge 
of phrases (only if they are in focus position in Basque), and not additional ones, 
that are employed to mark the accented position in phrases with accented words. 
Turkish and Huari Quechua also share the property of having two options for the 
rightmost phrase-level tone, H, and L,, differentiated by whether the AP occurs in 
prenuclear of nuclear position in Turkish and distinguishing the rising from the 
falling contour in Quechua. 

So far, we have only compared AP/PhP- and word-level prosody here. At higher 
levels of prosodic structure, further points of commonality and separation can be 
added: e.g., while Kamali (2011: 86-87) argues that in Turkish there is only a single 
higher-level boundary tone, H%, which is optional (but cf. Góksel & Kerslake 2005: 
35-39 for a differing account), Japanese has an array of complex higher-level tones 
(L%, H%, HL%, HLH%) called “boundary prosodic movements” (BPM; Igarashi et al. 
2013; Igarashi 2015, cf. also Venditti et al. 2008: 471-476). They encode pragmatic 
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contrasts. In Huari Quechua so far, there is no evidence for additional higher-level 
tones. However, this should be taken as a preliminary finding: as Venditti et al. 
(2008: 472—473) point out, the full inventory of BPMs in Japanese has until recently 
been overlooked (including by Pierrehumbert & Beckman 1988), and this is not 
least because some of them are exceedingly rare: in a corpus of more than 178,000 
phrases, the two least frequent BPMs, HL% and HLH%, only occurred 419 (0.296) 
and 14 (0.008%) times. Their distribution is also extremely dependent on genre and 
context. That makes it not unlikely that they aren't present, or haven't been recog- 
nized, in our Huari Quechua sample (cf. the singular example of KP04 MT Q 0393 
in section 6.1.7). Northern Biscayan Basque and Turkish assign phrase accents 
based on a mixture of prosodic and syntactic conditions: in Turkish, all APs in 
prenuclear position receive a final rise (H- in Kamali 2011); in Northern Biscayan 
Basque, even unaccented words receive a final pitch movement that is identical in 
form to the falling accent when they are the last word in a phrase that is in focus 
position, i.e. directly preceding the verb, according to Hualde et al. (2002: 549—550). 
A purely prosodic version of this condition, where only the strongest word in the 
phonological phrase has what could be called an accent in a completely regular 
position, is quite a close-fitting description of the Quechua pattern seeking align- 
ment with the word penult. 

Discussions of the prosody of Japanese, Basque, and Turkish often group them 
together under the label of *pitch-accent languages" because oftheir several shared 
properties (cf. Féry 2017). Should Huari Quechua now be added to this group as a 
further member, based on the number of intriguing similarities we have encoun- 
tered? Despite these similarities, it should be kept in mind that in our data, differ- 
ent prosodic variants of both Spanish and Quechua assume prosodic configurations 
that each on their own resemble what has been described as quite diverse prosodic 
“types” in the literature. In the next section we will see that the same speaker may 
produce the same lexical item of Spanish origin in the same conversation one time 
with an *inherited" accent, another with a *grafted" one, and yet another with one 
of the three patterns that would also be used on phrases composed only of words 
of Quechua origin. To the extent that Quechua prosody is similar to that of the 
three other *pitch accent" languages, this would then be more valid for the first 
two instances than for the third one, while phrases produced with an exclusively 
edge-oriented alignment pattern would perhaps more closely resemble French 
or Korean. Thus I would hesitate to say that simply because we can analyze the 
prosodic behaviour of these Spanish-origin words in parallel to the distinction 
between unaccented and accented words in other languages, it is reasonable to 
classify Quechua as belonging to a prosodic *prototype" like the T-type pitch accent 
languages proposed in Hualde et al. (2002). In addition, we simply do not know 
very much about the tonal properties of very many languages, so that creating pro- 
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totypes because two or three languages whose prosody we have studied a little 
more seem to show parallels in some analyses might be premature. The theoretical 
cast of the analysis itself is also crucial: Igarashi (2015: 561) points out that French 
has been described both as having an accent-based prosody (e.g. by Post 2002) and 
an edge-based prosody (e.g. by Jun & Fougeron 2000, 2002), which are elsewhere 
seen as representing two different prosodic types. On the other, the idea of prosodic 
types themselves seems difficult to maintain: Hyman (2006) argues that the expo- 
nents of “pitch accent languages” are too heterogeneous for it to be a useful type, 
and Beckman & Venditti (2010) expand this assessment also to “tone languages” 
and “stress languages” after detailed case studies, voicing the suspicion “that the 
appearance of prototypes comes from looking too closely at just one or two of the 
functions in which tone participates, as well as from being thoroughly immersed 
in the consensus assumptions of specialists in just one or two Sprachbund regions” 
(Beckman & Venditti 2010: 642). Hyman (2014) essentially concurs by proposing 
property-based (instead of type-based) typological comparison. 

Perhaps the problem in assigning types lies more in assuming that they are 
valid for whole “languages” rather than in the observation itself that certain prop- 
erties seem to cluster and form recurring patterns. It would then be a much more 
interesting question to ask what it is about these properties that makes it likely or 
preferable for them to occur together in some system, independent of whether this 
system is a historical language or just one possible variant besides others in the 
repertoire of an individual. In the following we will look at the loanword patterns 
from a quantitative perspective, in order to see how they vary by speaker, token 
and type of discourse. 


6.1.8.3 Distribution of the loanword patterns across speakers and lexical items 

To conclude the section on loanword patterns we'll look at the occurrence of pitch 
movements determined by the Spanish stress position in a subset of the corpora, 
the seven Concs and Maptasks. The purpose here is not to provide a comprehen- 
sive quantification of their frequency of occurrence but to demonstrate that pitch 
movements determined by the Spanish stress position vary in our corpus at least 
along the following dimensions: according to speaker and according to lexical item; 
possibly also according to conversation type. In the following two tables, the occur- 
rence of Spanish-origin word tokens is given and their frequency relative to all 
word tokens. Their realization is classified as either without pitch movement due 
to the Spanish stress position (i.e., like any other Quechua word), with pitch move- 
ment due to the Spanish stress position (i.e. grouping the “inherited” and “grafted” 
patterns together), or as indeterminable. The counts are separated by speaker for 
the two sets of corpora of Conc (Table 22) and Maptask (Table 23). The definition of 
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*word" here used is that of the Quechua morphological word: any stem together 
with its suffixes. Words were classed as indeterminable if either there was no or 
not enough pitch movement in the phrase overall to classify it, or if what was found 
could be interpreted to represent one of the Quechua alignment patterns just as 
much as tonal alignment relative to the Spanish stressed syllable with no way of 
telling them apart (e.g. a rising contour on an individually phrased trisyllabic word 
where the tonal transition takes place on the penult and the penult also corresponds 
to the stressed syllable in the Spanish word). The counts here are thus necessar- 
ily the result of some interpretation, but they broadly indicate differing strategies 
between speakers. 


Table 22: Prosodic patterns on Spanish-origin word tokens in the seven Quechua Conc corpora. The 
counts in the last column exclude word tokens classed as *indeterminable". 


ua 9 Total number of words by that :g 328g 

o x sa. aon. 

2 E speaker 2 o @ 3.0 

zo 

3 = Spanish-origin words e > Z 3 = S 
3 n S = zz 

= 223 $3 EZ | > E 385 

= V P =) i e © o 2 o o25g 

5 283 53 8 s "ec MES 

v D uq S = 3 = = d a = 2 V 

D D [7] eg 3 V Qe 

9 3 e = 5 wo 

o 56 a o E] =. =i 

2 R < g 3 v [7] F 3 

S lo V o = 7 Í s 

`= u 3 Sa e g 5 o 

et D Ed ox 

dz 28m g, gh 

P Ed Eli 5 E 

2 = 6 
D 

0714 Conc 2 9 5 16 31 0.52 0.56 

Qz13 (180.7) 12 9 11 32 48 0.67 0.28 
QF16 Conc 4 0 3 7 48 0.15 0 

SG15 (218.8) 1 7 8 16 79 0.20 0.44 
TP03 Conc 2 0 1 3 39 0.08 0 
KP04 (172) 4 0 3 7 70 0.10 0 

AZ23 Conc 33 2 11 46 70 0.66 0.04 

2724 (222) 8 1 2 11 20 0.55 0.09 

XU31 Conc 6 1 1 8 40 0.2 0.13 

0A32 (192.3) 8 1 0 64 0.14 0.11 
ZR29 Conc 6 0 5 11 36 0.31 0 

HA30 (300.2) 7 3 0 10 117 0.09 0.3 
XQ33 Conc 3 0 3 6 29 0.21 0 
LC34 (193.6) 3 0 2 5 36 0.14 0 
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Table 23: Prosodic patterns on Spanish-origin word tokens in the seven Quechua Maptask corpora. 
The counts in the last column exclude word tokens classed as *indeterminable". 


$ S Total number of words by that H g 3$ g 
E] 3 speaker ao $20 
Á * 
3 = Spanish-origin words 5 E Z 3 Is 
S = 2250 
E 223 #2 E ü ` $3 325 
z a? S8 E g ao o2 
- ph = 5 2 et 2 
5 Sor m S s E a2 
P on S = : =x D —€ i V 
o © u 2 3 S 2 Q 
8 2 3 I 6 5 5 ^ g 
E: 3.9 © 3 D a 23 
2 25 a p = 7 Fri 
= as 0a o Q r 
go u A a e 
os LE 2. S 
7 go 3 
2 z 3 
oO 
oz4 MT 7 10 8 25 151 0.17 0.4 
Qzi3 (234.2) 2 3 5 10 80 0.13 0.3 
QF — MT 7 2 2 m 103 0.11 0.18 
scis — (84) 1 0 0 1 1 0.09 0 
T0 MT 5 1 8 14 198 0.07 0.07 
KPo4 (803) 17 8 14 39 301 0.13 0.21 
Ap M 1 2 2 15 95 0.16 0.13 
ZDA (176) 2 2 5 50 0.10 0.4 
XU31 MT 57 16 4 7 622 0.12 0.21 
OA32 (777.6) 12 5 5 22 181 0.12 0.23 
Z9 MT 2 0 1 3 34 0.09 0 
HA30 (262.8) 16 4 17 37 205 0.18 0.11 
Xo3 MT 1 0 1 2 103 0.02 0 
LC34 — (28) 1 3 1226 266 0.10 0.12 


Some speakers (0714, QZ13, SG15) tend to realize a substantially higher number of 
the Spanish-origin word tokens they produce with pitch movement on the Spanish 
stressed syllable than others (AZ23, TP03, ZR29, XQ33), while the rest seems to be 
in-between. However, we also see that the counts differ according to the corpus 
type (Conc or Maptask). Two factors must be noted related to the nature of the 
experimental tasks: firstly, the proportion of Spanish-origin word tokens that are 
produced also varies by speaker and corpus type, thus varying the number of 
opportunities at which either alignment pattern could be produced. Although the 
central objects in both tasks were the same and controlled for by the experiment- 
ers, speakers were still free in their choice of name for these objects and of all 
other lexical items they used in the task. Secondly, speakers speak more or less, not 
only by individual preference, but also because of the constraints of the task: in the 
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Maptasks, it is usually the speaker with the path already drawn on the map who 
will speak more. In Conc, it depends on the course of the game: if a speaker begins 
the game who happens to memorize all or nearly all of the locations of the objects 
correctly, they will speak much more than the other speaker, yet if their game is 
more balanced, so their speaking portions will tend to be. Table 24 gives the counts 
for each speaker individually, but pools the data from both corpora together, some- 
what correcting for these two factors. 


Table 24: Ratios (in 96) of Spanish-origin words against all words, and of pitch movement classified as 
due or not due to the Spanish stressed syllable, or indeterminable, on Spanish-origin word tokens, 
across both Conc and Maptask taken together. 


2 o Ratio Spanish- Pitch movementdue ` No Pitch movement Indet. 

S 3 origin words/ to Spanish stressed due to Spanish stressed (%) 

z < all words (%) syllable / Spanish- syllable / Spanish-origin 

origin words (%) words (%) 

OZ14 MT+ 22.5 46.3 22 31.7 
Conc 

QZ13 MT+ 32.8 28.6 33.3 38.1 
Conc 

QF16 MT+ 11.9 11.1 61.1 27.8 
Conc 

SG15 MT+ 18.9 41.2 11.8 47.1 
Conc 

TP03 MT+ 7.2 5.9 41.2 52.9 
Conc 

KP04 ` MT+ 12.4 17.4 45.7 37 
Conc 

AZ23 MT+ 37 6.6 72.1 21.3 
Conc 

2224 MT+ 22.9 18.8 62.5 18.8 
Conc 

XU31 MT+ 12.8 20 74.1 5.9 
Conc 

OA32  MT* 127 19.4 64.5 16.1 
Conc 

ZR29 MT+ 20 0 57.1 42.9 
Conc 

HA30  MT* 14.6 14.9 48.9 36.2 


Conc 
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Table 24 (continued) 


2 9 Ratio Spanish- Pitch movement due No Pitch movement Indet. 
$ E origin words/ ` toSpanish stressed due to Spanish stressed (9) 
R ^ all words (%) syllable / Spanish- syllable / Spanish-origin 
origin words (%) words (%) 
XQ33  MT* 6.1 0 50 50 
Conc 
LC34 MT+ 10.3 9.7 45.2 45.2 
Conc 
Avg. MT+ 17.2 17 49.2 33.8 
Conc 


Table 24 allows us to make a better judgment about differences between compa- 
rable?*! speakers. The most general conclusion is that no speaker produces all or 
even more than half oftheir Spanish-origin words identifiably with one of the loan- 
word patterns (but keep in mind the sizeable number of tokens classed as indeter- 
minable), but probably also no speaker (with sufficient data) is entirely incapable 
of it. Apart from that, reliable conclusions about genuine individual differences 
with regards to their loanword-prosodic preference can only be drawn for some 
speakers: e.g. speaker AZ23 has a ratio of Spanish-origin word tokens that is com- 
paratively high and also roughly approximate to that of speakers OZ14 and QZ13. 
All three of them also produce a decent absolute number of tokens. Yet, her ratio 
of loanword pattern realizations is the lowest of all speakers with sufficient data, 
while OZ14 and QZ13's are among the highest, with almost 40% difference between 
AZ23 and OZ14. This is also not due to differences in lexical choice. The pairs of OZ14 
and QZ13, and AZ23 and ZZ24, respectively, follow very similar strategies in their 
Conc tasks: they describe the locations of the object cards in the game with refer- 
ence to the rows and columns that are laid out on on the table. Other speaker pairs 
instead prefer to describe the locations via relative reference to other cards (e.g. 
tsaypa hanan kaq *the one above that one", KP04 Conc Q 0225), or to the border 
of the grid (e.g. washa kantu-chaw *at the edge over there", HA30 Conc Q 0212). 
0214, QZ13, AZ23 and ZZ24 all refer to the rows and columns by numbers using 


351 Comparable are those that produce enough words overall, with an approximately similar ratio 
of Spanish-origin words. The extremely low ratios of loanword pattern realizations by XQ33 and 
TP03 should e.g. be taken with some caution, because they use Spanish-origin words themselves 
quite infrequently, and the low and high ratios by ZR29 and SG15, respectively, because they both 
produce comparatively few words overall. 
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the Spanish ordinal numerals (primer/o/a, segund/o/a, tercer/o/a) as lexical stems.?9? 
Table 25 lists the Spanish-origin word types these speakers use in Conc together 
with their token occurrence. The majority of Spanish-origin word tokens in their 
two Conc corpora belongs to four lexical item types, fila ‘row’, and the three first 
ordinal numbers, primer(0)? ‘first’, segund@ ‘second’, and tercer(@) ‘third? (cf. 
rows with grey background in the table). 


Table 25: Spanish-origin lexical item types with their word forms and token number of occurrence in 
the Quechua Conc corpora of QZ13 & OZ14 and AZ23 & ZZ24, respectively, according to speaker? 


lexical item word form Qz13  OZ14 AZ23 224 all 
ahí ‘there’ ahí - - 1 - 1 
faltar ‘to lack’ falta lack.3S.PRES.IND??? - - 1 - 1 
falta-n /ack-3 - - 1 - 1 
fila ‘row’ fila 2 - - - 2 
fila-chaw row-LOC 8 5 13 3 29 
cajón ‘box’ kahun-chaw box-LOC - - 1 - 1 
lado “side? ladu-n side-3 1 - - - 1 
laa-ni-n-chaw side-FON-3-LOC - 1 - - 1 
primer-(@) ‘first’ primer 2 - 3 1 6 
primer@ 1 1 2 1 5 
primer@-chaw first-LOC 1 2 5 2 10 
ric@ ‘rich’ riiku 1 - - - 1 
segund@ ‘second’ segund@ 5 2 3 1 11 
segund@-chaw second-LOC 2 1 4 1 8 
tercer-(@) “third” tercer 2 - 1 - 3 
tercer 1 2 5 1 9 
tercer@-chaw third-LOC 5 2 5 1 13 


352 Some of the other speaker pairs also refer to the rows and columns, but use relative location 
terms for differentiation (punta ‘front’, waqta ‘back’) or the Quechua numerals (huk ‘one’, ishkay 
‘two’, kimsa three? that do not seem to have special ordinal forms (e.g. kimsa fila-chaw ‘in the third 
row’ and kimsa allqu ‘three dogs’). 

353 The “@” here stands in for a vowel segment that could be either -a or —o, the Spanish gender 
desinences. When the speakers produce these in a Quechua utterance, they do not pay much atten- 
tion to gender agreement, if applicable. There are also occasional cases of gender mismatch in Span- 
ish, e.g. segundo fila (0714 Conc_ES_1785). This has been noted in the speech of Quechua-Spanish 
bilinguals before (Escobar 2000, 2011). These segments often have a centralized vowel quality, also 
when the tonal transition takes place on them. 

354 Spanish-origin morphological glosses (verb endings) are given in italics and underlined, Que- 
chua-origin morphology as well as all stem glosses are given in italics only. 
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Table 25 (continued) 


lexical item word form Qz13 0Z14 .AZ23 Z224 all 
ültimQ ‘last’ ultim@-chaw /ast-LOC 1 - - - 1 
ver ‘to see’ vistes see.2S.INDEF.IND - - 1 - 1 
all 32 16 46 11 105 
(140) AZ23 Conc 0122 


primer fila-chaw terceru-chaw primer fila-chaw  terceru-chaw pitu 
first — row-LOC third-LOC first — row-LOC third-LOC whistle 


“in the third one in the first row, in the third one in the first row [is] the 
whistle" 


Figure 164: AZ23 Conc Q 0122?** (declarative with four rising and one falling contours). 


This section has already seen a number of examples from QZ13 and OZ14's Conc 
with loanword pattern realizations on these words. For comparison, see AZ23 
Conc 0122 ((140)/Figure 164), where almost all of these lexical items appear in 
phrases without the Spanish stressed position as locus of pitch movement. Instead, 
they exhibit Quechua phrasal accentuation. Just like words of Quechua origin, 
Spanish-origin words like primer can be realized as part of the initial low stretch 
without any pitch events, frequently if they are in a modifying position. Thus even 
on nearly identical lexical items in the same task, there are individual speaker dif- 
ferences (between AZ23 on the one side and QZ13 and OZ14 on the other) with 


355 https://osf.io/g6qux/ 
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regards to the use of loanword prosody. Conversely, the importance of lexical iden- 
tity can also be shown fairly unequivocally, in that the same speakers under the 
same context conditions can treat different lexical items of Spanish origin very 
differently prosodically. This is seen when comparing the realizations of forms of 
lado ‘side’ with those of murciélago ‘bat’ in the Quechua Maptask and Conc corpora 
by speakers XU31 and OA32. Fortunately, those two speakers, upon encountering 
the conflicting section in their maps, patiently began the game anew several times, 
determined to iron out the kinks in their interaction by simply redoing everything 
again and retracing their steps. This has resulted in a Maptask of nearly 13 minutes 
of continuous conversation, with similar but varying utterances containing always 
the same set of lexical items. One of the objects causing a conflict in their maps 
was the bat, so it is quite often referred to. In addition, both speakers seem to have 
incidentally forgotten the Quechua word for ‘bat’, tsiqtsi (or only one of them forgot 
and the other followed him in that usage), so they use forms of murciélago through- 
out this long map task. In most other corpora by other speakers, tsiqtsi is used to 
refer to a bat, and its occurrence in their Cuento shows that tsiqtsi is also in XU31’s 
and 0A32’s lexicon. Murciélago in their map task thus is probably something of an 
ad hoc-substitute for tsiqtsi via access to the multilingual lexicon available to the 
speakers. Morphosyntactically, it is used in the corpus like any other noun, suffixed 
with Quechua nominal morphology. Murciélago is a fortunate case also in that it 
is proparoxytonic in Spanish, so that even if it is phrased as a an individual word 
without suffixes in Quechua, accentuation on the Spanish stressed syllable /sje/ can 
be differentiated from a phrase-final rise beginning on the penult. On the other 
side, forms of lado are used by nearly every speaker at least once in their Quechua 
data. Etymologically *native" words for 'side' are found in one dictionary, like awi 
or niq (cf. Carranza Romero 2003: 40, 140), but in another, actually older one, laadu 
is the only entry with the meaning 'side' (Parker & Chavez Reyes 1976: 260). In our 
data as well, only forms of lado are used. There is also a shortened form, laa, but 
speakers use both with the same meaning*® in the same task, and both forms can 
be and are suffixed with Quechua nominal morphology. Note that even though lado 


356 Though perhaps not with the same morphosyntax: in this very small corpus, XU31 uses laa 
exclusively following either the demonstrative kay or the locative nouns hana ‘above’ and hawa 
‘below’, while lado is used everywhere else (see Table 26). This might suggest that kay laa, hana 
laa and hawa laa, as well as perhaps others involving basic directions, have become fused lexical 
expressions for him with the meaning ‘this side’, ‘upper side’, and ‘lower side’, respectively. How- 
ever, 0A32, while using it only twice overall, produces laa once following manka ‘pot’ in manka 
laa-n-chaw (pot side-3-LOC) ‘beside the pot’, which is far less plausible as a lexicalized expression. 
For both it is the case that while a noun preceding a form of lado (as well as other location nouns) 
may optionally be suffixed with the genitive —pa in a construction marking relative reference / 
possession of the form X(-pa) ladu-n(-suffix) (X-(GEN) side-3(-suffix)) *0X's side’, such as manka-pa 
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seems to have a much more comprehensive history as a loanword in Quechua than 
murciélago, it is of course also an active item in the Spanish lexicon of these bilin- 
gual speakers, and stressed “correctly? when used in Spanish by them. However, it 
can be assumed that lado, both in their Spanish and in their Quechua, will have a 
much higher frequency in the daily usage of these speakers than either murciélago 
or tsiqtsi. 


Table 26: Token counts for word form types of murciélago, lado, escoba, and abeja in the Quechua Conc 
and Maptask corpora of the speaker pair XU31 and OA32 according to prosodic pattern. 


speaker 


token word form 


murciélago (in Conc and Maptask) 


“inherited” 
pattern 


“grafted” 


pattern 


no pitch 
movement 
due to 
Spanish 
stress 


indeterminable 


all 


XU31 


murcielagu 


murcielagu-ta bat-OB/ 


murcielagu-qa bat-TOP 


murcielagu-pa bat-GEN 


all 


OA32 


murcielagu 


murcielagu-ta bat-OBJ 


—|N|ho 


murcielagu-m bat-ASS 


murcielagu-man bat- 
DEST 


all 


both 


lado (in Conc and Maptask) 


XU31 


ladu-n side-3 


ladu-n-chaw side-3-LOC 


ladu-n-chaw-mi side-3- 
LOC-ASS 


ladu-n-pa side-3-GEN 


ladu-n-pa-m side-3-GEN- 
ASS 


ladu-n-pita side-3-ABL 


laa-ni-n-chaw side-FON- 
3-LOC 


laa-ni-n-chaw-mi 
side-FON-3-LOC-ASS 


ladu-n “the pot's side”, the dependent-marking with —pa does not seem to occur when the possessor 
is followed by laa instead of lado, which might suggest affix status for laa. 
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Table 26 (continued) 
speaker token word form “inherited” “grafted” no pitch indeterminable all 
pattern pattern movement 
due to 
Spanish 
stress 
laa-ni-n-chaw-raa-mi - - 1 - 1 
side-FON-3-LOC-CONT-ASS 
laa-ni-n-pa side-FON- - - 3 - 3 
3-GEN 
laa-ni-n-pa-mi - - 1 - 1 
side-FON-3-GEN-ASS 
li-n-pa-mi side-3-GEN-ASS - - 1 - 1 
all 1 - 41 - 42 
OA32 ladu 1 - - - 1 
ladu-n side-3 - - 1 - 1 
ladu-n-chaw side-3-LOC - - 8 - 8 
laa-n-chaw side-3-LOC - - 1 - 1 
laa-ni-n-chaw side-FON- - - 1 - 1 
3-LOC 
all 1 - 11 - 12 
both 2 - 62 - 64 
escoba (in Conc and Maptask) 
XU31 escoba-man broom-DEST - - 3 - 3 
escoba-pa broom-GEN - - 1 - 1 
all - - 4 - 4 
OA32 escoba-pa broom-GEN - - 1 - 1 
both - - 5 - 5 
abeja (in Conc and Maptask) 
XU31 abeja-kuna bee-PL - 2 - - 2 
abeja-ta bee-OB/ - - 1 - 1 
all - 2 1 - 3 
OA32 abeja-kuna bee-PL 1 - - - 1 
both 1 1 - 4 


Table 26 gives the counts for word form types of murciélago and lado in the Quechua 
Conc and Maptask corpora by XU31 and OA32, sorted according to the prosodic 
pattern realized on them. While overall counts differ of course, there is a very clear 
trend: phrases containing word forms of murciélago (usually produced with a high 
back vowelin the last segment) are realized with one ofthe patterns sensitive to the 
Spanish stress position more than half ofthe time (13/20 times = 65%) here. Phrases 
containing forms of lado almost never are (2/64 times = 0.03%). This is pretty solid 
evidence that even with the same speakers in the same task, lexical identity is a 
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relevant factor for the prosody of Spanish-origin words in Quechua. Relating this to 
their history sketched above, the obvious hypothesis seems to be that with increas- 
ing integration into the Quechua lexicon, the marked prosodic patterns sensitive 
to Spanish stress become less frequent and the prosody thus more regular. The 
table also includes counts for realizations of forms of escoba and abeja, with the 
former never being produced with a pattern sensitive to Spanish stress, and the 
latter three out of four times. While the absolute numbers here are far too low to 
draw any definite conclusions, this suggests that the difference in prosodic realiza- 
tion is not specific to murciélago and lado, but is instead more generally influenced 
by lexical identity. It is perhaps noteworthy that most of the other speakers use 
pitsana in Quechua for ‘broom’, but no speaker in the data refers to bees with any 
other word than abeja. Other speakers, including even AZ23 with her overall low 
preference for the loanword pattern, also produce the *grafted" pattern on it ((128)/ 
Figure 157). This seems to contradict the hypothesis of increasing prosodic regu- 
larization concomitant with lexical integration of loanwords, but perhaps abeja is 
simply infrequent overall. More subtle factors are clearly at play here than just the 
two we have been able to nail down, speaker preference and lexical identity. I leave 
a more in-depth exploration of them for further research. 

In conclusion, the tonal alignment patterns showing sensitivity to the *inher- 
ited" lexical stress position in Spanish-origin words is a marked but not infrequent 
and presumably stable prosodic option for bilingual Huari Quechua speakers. Its 
frequency of realization has been shown to vary between speakers and individual 
lexical items. Typologically, it seems to occupy a kind of niche in that the lexical 
alternation between words specified for an irregular accent position and those that 
lack this specification in a language otherwise employing tones mostly for the opti- 
mization of phonological/accentual phrases finds its counterpart in several unre- 
lated languages across the world, whose prosodic systems in other aspects also 
display similarities to what we have found for Quechua to greater or lesser extent. 
Any language that shows this phenomenon straddles a typological feature bound- 
ary: it both has and hasn't a lexically specified accent position for words which 
attracts tones realizing a pitch accent. 


6.2 Phrasing and relation to meaning-based categories 


Until now, attested Huari Quechua prosodic patterns corpus have been categorized 
only according to characteristics necessary for their prosodic definition. Animportant 
further aspect is the relation to meaning-based categories and the signalling of differ- 
ential information structure. This section will explore this issue based on a sub-cor- 
pus of 415 utterances from 1 Maptask (TPO3&KP04), 1 Cuento (AZ23&ZZ24), and 7 
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Conc (TPO3&KP04, QZ13&0714, SG15&QF16, AZ23&ZZ24, ZR29&HA30, XU31&0A32, 
XQ33&LC34) tasks. These constitute nearly all the utterances produced in these 
corpora; excluded were only a small number because of very quiet (whisper-like) 
Speech, too much background noise, or an abundance of hesitations and false starts. 
The utterances were analyzed and broadly annotated by hand according to their 
morphosyntax, information structure (explicit or likely implicit QUD, focus-back- 
ground division and information status of referents (given/new)), and intonation. In 
terms of intonation, this meant annotating tonal contours according to the patterns 
established in the previous sections and how they mapped onto words. All together, 
this provides insight into what information structural and syntactic configurations 
pattern frequently with which contours, and thus what factors might contribute to 
contour choice whether a contour is realized across one or several words (phrasing). 
In turn, this will allow us to form hypotheses about prosodic and metrical structure 
and its mapping to words in Quechua. The sub-corpus is quite heterogeneous, com- 
prising data from 14 speakers engaging in three different communicative tasks in 
spontaneous speech. To give one example, overall only 120 of these 415 utterances 
contain verbs,” while the other 295 are made up only of nouns (and a few adverbs 
and particles). However, of those 120, 98 stem from the two Maptask and Cuento 
corpora, leaving only 22 for the seven Conc corpora. This is because largely, conver- 


357 A variety of Quechua suffixes allow conversion between the two main word classes, verbs 
and nouns, so that the resulting word form can then take suffixes exclusive to the resulting word 
class. Accordingly, a word form having undergone conversion to a noun from a verb (eligible to 
take nominal suffixes and behave syntactically like a noun, e.g. being argument to some predicate) 
is counted as a noun, and vice versa. This is not a central issue here, but there are quite complex 
cases where verbs with filled argument slots are nominalized and serve then again as arguments 
in higher-order predicates, such as in (i). 


(i) ZZ24 Cuent Q 1825 
huqta chuspita watiu-tsi-nqa-n-pa alma-n-ta qara-sha 
six fly-OBJ  die-CAUS-NMLZ-3-GEN soul-3-OBJ give-PRTCP 
*[he] gave [him] the souls of the six flies he had killed" 


Here, huqta chuspita is the direct object to the predicate waftutsinqanpa, which is nominalized by — 
nqa and the possessive modifier (indicated by the genitive -pa) to almanta, in turn the direct object 
ofthe main verb qarasha, whose subject and indirect object are both null and only provided by con- 
text. Translating this as a relative clause with the six flies as the head noun in English seems natural, 
but it is the nominalized verb form which is the head in the Quechua sentence (evidenced by the 
genitive which indicates the relation between the souls and those they belonged to). This structure 
cannot be faithfully rendered in English. An approximation would be something like *he gave the 
souls of those he had killed (the six flies)", with the object of the subordinate verb in parentheses 
and an additional pronominal relative head. What is nominalized here is not just the verb wafiutsi-, 
but the entire clause huqta chuspita wafiutsi-. Such cases make it clear that *nominalization" in Que- 
chua is perhaps a misnomer, or at least a rather complex issue. Cf. also Lefebvre & Muysken (1988). 
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sational moves in Conc are of the broad form “[noun phrase denoting a location] is 
[noun phrase denoting an object]", with the copula in the third person present tense 
not realized. This heterogeneity, in itself interesting, makes it difficult to draw certain 
generalizations from the entire dataset. In the following we will look at correlational 
patterns taken from subsets of the data that are within themselves comparable to a 
degree, and also explore further sources of variation existing between the corpora. 


6.2.1 Brief excursion: The question of (polar) questions 


In this section, I will briefly discuss and exclude the possibility that the rising contour 
is used exclusively or even mainly to signal questions, in particular polar questions. 
Wh-questions are not any more often realized with a fall than declaratives, as far as 
could be determined. They do not show any noticeable differences to them intona- 
tionally. Note that the wh-word in most cases is phrased on its own with a (rising-) 
falling contour (cf. Figures 132, 148, 165). 


(141) CJ35 Quien Q 0163 
imanirtaa = gerra-qa ka-sh imanirtaa ` pelya-ykaa-yaa-nqa 
why-DETVAR war-TOP COP-PRTCP why-DETVAR fight-PROG-PL-3.FUT 
*why was there a war? why would they be fighting?" 


why-DETVAR war TOP COP-PRICR why-DETVAR Tha -lPROG-PL-3.FUT 


Figure 165: CJ35 Quien Q 0163? (two wh-questions). 


358 https://osf.io/wq9s5/ 
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Regarding rising contours in polar questions, I investigate their occurrence using 
data from the Quien task, which was also used in section 6.1.6. It was originally 
intended to elicit questions in particular. Morphology and particles play a role for 
polar questions in Huari Quechua: the morpheme —ku is often associated with polar 
questions and can attach to any word in a sentence; if —ku is present in a sentence, 
the evidentials -m(i)/-sh(i)-ch(i) cannot be (Parker 1976: 148-149). It also occurs in 
alternative questions (cf. Figure 166): both alternatives are here marked with —ku, 
optionally the Spanish o ‘or’ is also inserted between them. The first alternative in 
them is realized with a rising contour, and the second with a falling one that often 
seems downstepped in pitch level. 


(142) CJ35 Quien Q 0349 
ruku-na-ku hoobin-lla-raq-ku tsay runa-qa 
old-DISC-Q  young-LIM-CONT-Q DEM.DIST person-TOP 
“is the person old already or still young?" 


1 


1 
ET 


) ij " 


m TT 


soku-na-ku hoobe-la-raq-ku ba) runa-qa 


oM-DISC-Q young IM-CONT-Q DEM DIST peres TOP 


Figure 166: CJ35 Quien Q 0349??? (alternative question). 


(143) XQ33 Quien Q 0059 
aqtsa-sapa-ku ka-shqa 
hair-AUGM-Q COP-PRTCP 
*did s/he have a lot of hair?" 
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VAI annan 


V 


M 


Figure 167: XQ33 Quien Q 005979? (neutral polar question with -ku). 


Also relevant for questions is the particle aw, which when used sentence-initially 
means “yes”, but in our corpus is more often used utterance-finally as a question 
tag (cf. also section 6.1.7). Table 27 gives counts for occurrences of polar questions 
in Quien, separated by type of question. Figure 167 gives an example for a neutral 
polar question. *Neutral" polar questions mean those where the speaker is inquir- 
ing for a truth value on a proposition introduced by the root sentence, when no 
prior knowledge regarding either outcome can be assumed for the speaker or to 
be present in the common ground (here based on discourse context observation, 
cf. Farkas & Bruce 2010: 94-95; Dayal 2016: 4). *Biased" are all manner of polar 
questions with an epistemic bias in the context, accessible to the speaker either 
because of a previous utterance or via inference from the common ground, e.g. 
types of clarification questions and confirmation-seeking questions (cf. Farkas & 
Bruce 2010; Lupkowski & Ginzburg 2016). The category was here chosen to be fairly 
broad. The table compares both types of question against the presence of -ku and 
aw, and the realization with a rising contour on the main utterance and separately 
on aw. The counts in the table broadly confirm findings reported by Cole (1982) 
and O'Rourke (2005) on other varieties of Quechua that polar questions are not in 
general realized as rises. However, they also offer interesting qualifications of that 
general result. It seems that in particular, neutral polar questions without —ku are 
prone to being realized with a rising contour, while both —ku and a biased question 
make a rising realization less likely (cf. Muntendam 2017 for similar findings on 
this association in Southern varieties of Quechua). 
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Table 27: Occurrences of polar questions in the Quien corpus, ordered by type of 
question, coocurrence with the suffix -ku, utterance-final use of particle aw, and 
utterance-final realization with a rise or fall. 


Type of polar -ku with aw final rise on rise on aw 
question main utterance 
yes no 

Neutral + 13 0 3 10 -- 

- 10 0 10 0 -- 

Biased + 4 3 1 3 0 

- 13 3 2 11 2 

Both + 17 3 4 13 0 

- 23 3 12 11 2 

Total 40 6 16 24 2 


Regarding aw, its use seems restricted to biased questions, supporting the hypothe- 
sis based also on observing it throughout the Huari Quechua data that it serves as a 
question tag for confirmation-seeking questions, with a similar meaning to Spanish 
tags no, verdad, or eh, German oder, or English innit or huh. Inspection of individual 
examples reveals that when aw is integrated into the preceding phrase, it is usually 
not realized as a rise (but recall Figure 155 from section 6.1.7), but if it is produced 
in a separate phrase, that phrase is always realized with a rising contour. Individual 
examples like (144) suggest that aw and the realization with a rise further differen- 
tiate within biased questions. 


(144) SG15 QF16 Cuento Q 1233-1382 (context for Figures 168 and 169)**' 


time SG15 QF16 
(seconds) 
123.3 ima-taq tsay runa ka-rqa-n 


what-DETVAR DEM.DIST person COP-PST-3 
what was that man 


124.6 hampikuq aw 
healer yes 
a healer right 
125.4 hampikuq 
healer 
a healer 
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126.1 hampikuq runa 
healer person 
a healer 
127.0 y naani-qa 
and path-TOP 
and what about the path 
128.6 karu-tsura ka-ra-n o 
far-CONJ COP-PST-3 or 
was it rather far, or 
129.7 karu-sh ka-naa 


far-REP COP-PST.REP 
it was far (it is said) 
130.6 karu 
far 
far 
131.0 mh-[mh] 
131.2 [y] mirkapa-ta ima-ta-taa apa-naa 
and provisions-OBJ what-OBJ-DETVAR 
bring-PST.REP 


and what did he bring as food 
133.0 (3.3) 
136.3 mirkapa-ta apa-sha 
provisions-OBJ 
bring-PRTCP 
he brought food 
137.6 aha 


yes 


(144) is an excerpt from Cuento by SG15 and QF16. QF16, a school teacher, has just 
told the story to SG15 for the first time. Instead of waiting for her to re-tell it by 
herself, he asks her a number of questions to elicit the main points of the story. 
At 123.3, he asks her what kind of man the protagonist was, and at 124.6 suggests 
an answer himself, hampikuq aw “a healer, right?" (Figure 168). This is clearly a 
question with the highest possible confirmation bias, as he himself just told her 
the story about which he is asking her now, i.e. he is both asking the question and 
has the epistemic authority to decide whether the proposition is true or not. This is 
produced with a falling contour on the main part of the question, and a sharp rise 
on aw, typical for this use of aw when phrased separately. 
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provisions- OU brwy-PRTCI* 


be brought food? 


Figure 169: SG15_Cuento_Q_1363*° (clarification-seeking polar question). 


Later, at 131.2, QF16 asks what kind of food the healer had taken as provision on 
his journey. SG15 responds at 136.3 with the polar question mirkapata apasha “he 
brought food?” (Figure 169). This is also a question biased towards confirmation 
(that is clear from QF16’s response via confirmation token at 137.6), but it is also 
very different: in terms of epistemics, SG15 has no authority to decide whether the 
proposition she has asked about is true. The information she asks for she has just 
received in the form of a presupposition in QF16’s preceding wh-question and she 
asks the person who is its source about the presupposition, whether she is correct 
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in assuming that the presupposed proposition should be entered into CG. The two 
utterances are also different in terms of their relative positions in the discourse 
(cf. Farkas & Bruce 2010): QF16's question at 124.6 is a provocation (an elaboration 
on the directly preceding turn, which is also a provocation), while SG15's question 
at 136.3 is a response to another question. More specifically, it is a clarification 
request for intended content, one of several types of such query responses accord- 
ing to the typologies proposed in Ginzburg (2012: 148-155); Lupkowski & Ginzburg 
(2016). Formally, it is also different, in that it is realized without aw, and with a 
rise on the question itself not strictly aligned to just one syllable. From the Huari 
Quechua corpus known to me, it seems that aw as a tag never occurs in contexts 
such as the one of SG15’s question, with a strong epistemic imbalance and as part 
of a clarification request asking about a proposition formulated as a presupposi- 
tion but received as new information.*™ Such clarification requests are also rarely 
realized with —ku or without a rise, in contrast to neutral polar questions, which 
show —ku with falling intonation in nearly half the cases. 

The connection between epistemic bias, the position of a turn in the discourse 
as provocation or response, whether a question targets at-issue or non-at-issue 
material of a preceding move, the presence of —ku and aw, and the prosodic realiza- 
tion of polar questions should be a promising field for further research to uncover 
whether any more restrictive correspondence actually exists and which semantic 
factors are really relevant. Here however, the findings from Table 27 must suffice to 
indicate that polar questions are not always associated with rises. The converse is 


364 If that is true it would suggest that questions with aw as tag are similar in meaning to some 
English/German/Spanish tag questions which would also not be felicitous in such a context: the 
a-responses with tag in (i) that seem infelicitous when the proposition they ask about has been 
entered (as a presupposition) into CG by the provocation, presumably because they express a 
kind of inference based on available unconflicting information and are thus odd when the infor- 
mation they represent as inferred has in fact just been given. In contrast, other forms of biased 
questions are felicitous in such contexts: the b- and c-responses, that seem to express some kind 
of inconsistency of the proposition presupposed by the provocation with previous information, 
with the degree of this *incredulity"/ *counterexpectation" arguably even stronger in the c-re- 
Sponses. 


(i) A: What did she bring as food for the journey? / Was hat sie für Proviant mitgenommen? / ¿Qué 
llevó de fiambre? 
a. B: She brought food, #right/#didn’t she? Sie hat Proviant mitgenommen, #oder? / ¿Llevó 
(algo de) fiambre, #no/#verdad? 
b. B:Soshe brought food? / Also hat sie Proviant mitgenommen? / ;Entonces llevó fiambre? 
c. B: She brought food, did she now? / Hat sie jetzt Proviant mitgenommen? /;Así que llevó 
fiambre? 
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also not true, as both previously seen examples and the results in the next sections 
show that rising contours on phrases and utterances in the Conc, Maptask, and 
Cuento corpora — overwhelmingly declaratives — are very frequent. Their distribu- 
tion is instead governed by other factors. 


6.2.2 Moves with rising and falling contours in Conc 


This section investigates how rising and falling contours are distributed across 
the Conc corpora (cf. the description in section 2.4). Following the entire discourse 
in Conc allows for keeping track of the information status and level of activation 
of the individual referents represented by the pictures and their locations quite 
well. As mentioned above in 6.2, conversational moves realized in Conc in Quechua 
very often consist of a specification of a location at which speakers believe that a 
depicted item is, or a specification about the item that they believe to be at that 
location. Logically, and permitted by the relatively free word order of Quechua, we 
would expect the sequential realization of such conversational moves to form a 
coherent guessing sequence to follow one of two patterns: a) Location — Item, or b) 
Item — Location, where Item and Location stand for simple or complex expressions 
encoding the item and the location, respectively. Sometimes both of these compo- 
nents together are produced as one utterance, but often a move, especially one spec- 
ifying the location, will consist of several parts that are all individual utterances, 
and sometimes there are simply longer silent breaks between the utterances realiz- 
ing the component moves, so that there is no one-to-one correspondence between 
moves and utterances.?65 


365 As working definition for *utterance", I take a very simple temporally and phonetically 
oriented one: a chunk of speech which a speaker produces in one go, without larger breaks or 
hesitations in between. Thus an utterance defined this way may or may not perfectly overlap 
with a conversational move, defined via criteria of discourse meaning, or a sentence, defined via 
syntactic criteria. It also does not necessarily have to be coterminous with an IP or other larger 
phonological unit, e.g. if it is an aborted or interrupted utterance, although I assume an utterance 
that is succesfully produced as planned by the speaker to contain one or integer multiples of one 
IP (by definition of what an IP is). It is also separable into smaller chunks - phrases - according 
to prosodic criteria. 
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(145) KP04 Conc_Q_0634-0658** (example conversational move sequence 


pattern a) 
time TER 
63.4 (L Hosts >) * S 
tsay punta-yki — ka-q-lla-chaw c S 5 3 
DEM.DIST front-2 COP-AG-LIM-LOC & ^" S 8 
the one in front of you 8 E < 
DRE 
645 (LH) CEST 
aha $ 3 a 
= 
65.0 (H ) t HD T F 
tsay wanupakush “h 85 
DEM.DIST burial GENE 
that's the burial E: 
oa 


(145) is an example for such a conversational move sequence following pattern 
a), specifying first the location and then the item found at that location (the image 
on the card when turned around). At 63.4, the speaker begins the sequence by 
naming the location in one utterance realized in a single rising phrase. He then 
confirms at 64.5, after the game master has moved his hand towards the correct 
card, and at 65.0 he names the item he believes to be shown on the card at that 
location using a rise-falling contour on the phrase realizing the item and conclud- 
ing the sequence. Here it is still counted as one sequence if a speaker specifies 
the location using several utterances, or if they follow the first utterance naming 
the item by additional ones giving a further description of it. For the purposes of 
counting, I also take additional utterances specifying the location after the item 
has been named (if the game master has not yet reached the intended card) but 
before the card has been turned around as belonging to the same sequence fol- 
lowing pattern a). 


366 https://osf.io/rnq5y/ 
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(146) 0A32 Conc Q 1198-1213 (example conversational move sequence 
367 


pattern b) 
time Adm mom 
119.8 (LH L) Seg TER 
pinkullu TS 3 S 
flute R = Sts 
33 
the flute s 8 8 
343589 85823 
1213 (D Ð LHL) S88 8£3 
tsuku-pa ladu-n-chaw ^ S$ S Sd = 
hut-GEN side-3-LOC R” 78 
beside the hut " 


(146) is an example for a conversational move sequence following pattern b). The 
speaker first names the item with a rise-falling contour (in 119.8), and then speci- 
fies its location with an utterance consisting of first a rising and then a rise-falling 
phrase (in 121.3). For this pattern, further utterances referring to the same item, 
occurring after the move specifying the location but before the card has been 
turned around, are also counted as belonging to the same move sequence. 

Table 28 shows the number of move sequences in the seven Quechua Conc 
corpora separated by speaker pairs and according to the sequential pattern they 
follow. 12 item cards were used in each game. The diverging numbers of move 
sequences between speaker pairs reflects the varying number of false guesses. 


Table 28: Number of conversational move sequences in seven Quechua Conc corpora. 


move order order not applicable? 

sequences Loc-Obj Obj-Loc 

overall?5? (pattern a) (pattern b) 
TP03 & KP04 17 16 1 0 
SG15 & QF16 17 12 5 0 
Q713 & 0214 15 15 0 0 
AZ23 & 7724 16 16 0 0 
XU31 & OA32 18 0 17 1 


367 https://osf.io/5j46n/ 

368 Counted here are only move sequences with which a guess about a location and an item at 
that location was made, not moves peripheral to the game, such as questions, banter between the 
speakers, etc. Those only make up a very small part of the spoken content of any of the Conc cor- 
pora and are ignored here. 

369 One move consisted of the speaker only saying the name ofthe item, at the moment when all 
other cards had been solved already, so that no sequence exists. 
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Table 28 (continued) 
move order order not applicable 
sequences Loc-Obj Obj-Loc 
overall (pattern a) (pattern b) 
ZR29 & HA30 21 20 1 0 
XQ33 & LC34 12 12 0 0 
All 116 91 24 


As seen from Table 28, overall, sequential pattern a) occurs with much greater fre- 
quency than pattern b). However, this is also a matter of speaker (pair) preference: 
only the speaker pairs SG15 & QF16 and XU31 & OA32 use pattern b) more than 
once, XU31 & OA32 never using pattern a), which is preferred by all other speaker 
pairs.” That the patterning correlates so well with speaker pairs might point to 
these patterns reflecting at least partially two different available strategies for 
reducing the context set (cf. Roberts 2012b). That is to say, if the task of the game is 
modeled as attempting to answer the super-QUD *What item is in which location?", 
then one strategy is to find answers to sub-QUDs that are formed using the elements 
of the set of possible locations as fixed variables imposing a referential restriction 
(* topics, Kuno 1982; Roberts 2011), and to ask for the corresponding items as open 
variables; and the other works the other way around, forming sub-QUDs with the 
items as fixed variables and asking for their corresponding locations. A reason for 
the preference of pattern a) over b) by most of the pairs might be that the setup of 
the game actually induces a bias for this, because with the cards lying face down 
on the table, the positions themselves (such as “the first card in the third row") 
are already established and unchangeable, while it is the identities of the cards at 
these positions which are in doubt. Furthermore, the locations are continuously 
present in the extralinguistic context, while the items' names have to be actively 
recalled by the speakers. In other words, the locations can be seen as more active 
relative to the cards' identities, and thus would lend themselves to be linguistically 
realized as given and topical material, i.e. signaling a QUD structure which asks 
for the identities of the items using the elements of the set of locations as anchors 
(= topics; assuming that information is packaged in the somewhat iconic order 
old-new / theme-rheme in Quechua (cf. Weber 1989: 427, who argues this for the 
related Huallaga variety of Quechua I). Additionally, naming an item for the first 
time before the card has been turned around can be seen as making a separate 
additional discourse commitment (in the sense of Farkas & Bruce 2010) from when 


370 In Spanish Conc the speakers also broadly follow these preference patterns, with XU31 & OA32 
preferring pattern b) over pattern a), while the other speaker pairs mostly follow pattern a). 
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a location is first named: because the set of items is not constantly available for 
inspection (the cards are turned upside down on the game table), naming an item 
makes a commitment to that item actually being a member of the set (i.e., remem- 
bering its presence correctly from before). A speaker can be just as wrong about 
this as about the separate discourse commitment that is made when relating an 
item to a location. Following pattern b) means making this discourse commitment 
initially as an existential presupposition, separately from the assertion that relates 
the item to the location. In pattern a) on the other hand, whatever cues might be 
used to mark this assertion (e.g. falling intonation, morphological marking, or word 
order) can conceivably be marked once on the last element in the sequence (= the 
expression denoting the item) and be understood to apply to both discourse com- 
mitments made. Note however that also in pattern a) the location is normally at 
least specified in each utterance anew; there is usually no topic continuity from 
a previous utterance. This can have a reflex in the utterance form, as argued in 
section 6.4.3. In this view, it would probably make sense for speaker pairs to collab- 
oratively follow one or the other strategy, because this allows for more economic 
signaling, and perhaps also for exploiting deviations from the established pattern 
in order to signal a marked information structure. Hypotheses along these lines 
well be explored further below, but for now note that the only speaker pair where 
the majority pattern is not the one established by the first speaker's first move is 
SG15 & QF16, who are probably the two speakers least acquainted with each other 
from outside of the experiments.?”* 

Table 29 relates conversational moves to intonation by counting all utterances 
belonging to a move sequence matching a location to an item, separated accord- 
ing to the corpus, whether they are part of a location- or item-specifying move 
and whether they are rising or falling. For the purposes of this table, utterances 
consisting of several phrases (with an identifiable pitch movement on each) were 
counted as rising or falling if the last phrase was rising or falling. If a single utter- 
ance realized both a location-specifying and an item-specifying move, the decision 
before the analysis was to evaluate the two moves separately if there was a clear 
phrase boundary separating the expressions in the utterance along that line. As it 
turned out, this was nearly always unambiguously the case, i.e. there was always 
an identifiable phrase boundary at the boundary between an item-specifying and 
a location-specifying move, evidence in itself that the division between these types 
of moves is reflected in the prosodic realization. Thus this table counts rising or 


371 While the other pairs are all either siblings or friends of a similar age who volunteered as pairs 
for our experiments, QF16 and SG15 are a school teacher and a cook who did not know each other 
well before paired for the experiment, each having volunteered individually. 
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falling moves, under the assumption that the final phrase in an utterance signaling 
such a move has priority. Further below, we will also look at the composition of 
utterances into phrases and their prosodic contours.??? 


Table 29: All moves that are part of a guessing sequence in the seven Quechua Conc corpora, 
Separated according to whether they specify a location or an item, and whether they are realized 
finally rising or falling. 


loc loc loc obj obj obj all total total % % % % 
rise fall indet.2 rise fall indet. loc obj rises falls rises falls 
loc loc Obj Obj 
TP03 & 14 3 2 2 17 0 38 19 19 73.7 15.8 10.3 89.5 
KP04 
SG15 & 13 12 2 2 13 1 43 27 16 48.2 444 12.5 81.3 
QF16 
QZ13 & 22 4 0 1 15 0 42 26 16 846 15.4 6.3 93.8 
0214 
AZ23 & 31 1 2 2 16 1 53 34 19 91.2 2.9 10.5 842 
ZZ24 
XU31 & 3 23 5 8 11 2 52 31 21 97 742 38.1 524 
OA32 
ZR29 & 6 19 2 142 12 1 52 27 25 22.2 70.4 48 48 
HA30 
XQ33 & 13 0 0 0 12 2 27 13 14 100 0 0 857 
LC34 
All 102 62 13 27 96 7 307 177 130 57.6 35 20.8 73.9 


Broadly speaking, location-specifying moves have a majority tendency to be rising 
(57.6 96), while item-specifying moves have a stronger tendency to be falling (73.9 
96). Because this tendency is so much stronger for item-specifying moves, falls are 
slightly more frequent than rises overall (158 vs 129, or 51.596 vs 42%), even though 
location-specifying moves are more frequent than item-specifying ones (177 vs 130, 
or 57.796 vs 42.496). One reason why there is no stronger correlation between rises 
and location moves on the one hand, and falls and item moves on the other, could 
be that there are speaker pairs who preferentially produce pattern b), and that 
actually rises occur at a non-final position in the sequence and falls at a final one, 
independent of the move type. That would mean that rising contours are effectively 


372 See section 6.4.3 for a more in-depth look at the Conc moves by one speaker pair, ZR29 & HA30. 
373 Moves were classed as ,indeterminable* when they were realized either totally flat, with no dis- 
cernible pitch movement that could be related to the contours established, or when large parts were 
voiceless. The indeterminables make up the missing percent to 100 in the last four columns in the table. 
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instances of what has been called “continuation rises" for many languages (cf. Chen 
2007: 108; cf. Beckman et al. 2002; Aguilar et al. 2009; Estebas Vilaplana & Prieto 
2010 for the use with regards to Spanish), i.e. final rises or high pitch signalling 
that the current topic is to be continued or that the speaker intends to maintain the 
turn. The cross-linguistic tendency for this has been hypothesized to result from the 
fact that pitch will tend to be higher at the beginning of an utterance due to height- 
ened subglottal pressure because air is typically inhaled before speaking. Thus the 
expected values initial high pitch and final low pitch are supposed to signal a new 
topic and finality, respectively, while conversely, initial low and final high pitch, are 
cues signaling continuation (cf. the “production code? hypothesis in Gussenhoven 
2002a: 51-52, 2004: 89-90). The prediction would be that rising contours occur on 
location moves and falls on item moves for speakers following pattern a), and vice 
versa for speakers following pattern b). Four of the seven speaker pairs following 
pattern a), TP03 & KP04, QZ13 & OZ14, AZ23 & ZZ24, and XQ33 & LC34, bear out this 
prediction, suggesting that the association of rising contours with continuation and 
falling contours with finality is generally correct. However, the data from the other 
pairs suggests a more complex picture. On the one hand, the speaker pair ZR29 
& HA30, who also nearly exclusively follow pattern a) (cf. Table 28), nonetheless 
realize 70.4 % of their location moves with falls, and about half (48%) of their item 
moves with rises. With these two speakers, circumstantial knowledge suggests that 
the high number of rising item moves might be due to uncertainty or perhaps more 
accurately the overt signaling of uncertainty due to the adoption of a social stance 
for whose outward construction that is seen as desirable.7* XU31 & 0A32 present 
another case. These two exclusively follow pattern b), and as expected from that, 
the great majority (74.2 96) of their location moves are falling, but the item moves 


374 The two speakers were both 19 years old at the time of recording and seemed to the ex- 
perimenters to present an outwards social identity that aims at attributes such as “cuteness”, 
*non-threatening femininity", “fun”, “making an effort to not seem overly scholarly ambitious". 
They frequently giggled during the course of the experiment, palatalized their speech somewhat, 
and gave statements to the effect that they didn't know how to proceed but then showed them- 
selves to do very well after only little encouragement. This perception is of course filtered through 
the lens ofthe experimenters' own expectation with regards to presentation of gender as shaped by 
their experiences of persons with similar stylistic behavioural choices and their own sociocultural 
upbringing, which is not the same as that of the speakers. Thus there might be misperceptions 
inherent in this assessment due to these divergent cultural expectations, and any evaluation of 
another's internal motives for outwards behaviour is of course necessarily conjecture. With these 
caveats, it seemed to the experimenters that what was projected by these two speakers was uncer- 
tainty. In Spanish, they also regularly realize the item move with a final rise in their Conc (again 
despite usually guessing right), supporting the hypothesis that this rather expresses a speaker-spe- 
cific stylistic choice or stance than an attitude towards each individual proposition. 
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also fall more than half of the time (52.4 %). At the beginning of the task, both speak- 
ers introduce the items with falling contours, but after the sixth completed guessing 
sequence, XU31 begins to introduce them with a rising contour, while 0A32 keeps 
realizing falling item moves until the end of the task. This can be considered in 
different ways: realizing a sequence only with falling moves (as 0A32 does) can be 
interpreted as simply not signaling the coherence of the sequence as a whole. With 
an initial item move realized with a fall, 0A32 signals only that the utterance or the 
move ends, but remains equivocal about its position in a larger sequence. In QUD 
terms, he thus does not signal that his assertion is a partial answer to a super-QUD 
like WHICH ITEM IS WHERE?, but merely that it answers WHICH ITEM? (specifying 
which is a separate commitment, as argued above). Given that it is known to all 
participants that realizing only an item move is not a sufficient communication 
in the terms of the game, 0A32 does not run a great risk of having the turn taken 
from him or being misunderstood. Realizing the item moves with falls is therefore 
cost-reducing (in terms of articulatory effort and lack of commitment to forward 
planning) for the speaker and only a little cost-increasing for the hearer. Another 
not necessarily rivaling explanation is that falling contours are more likely to be 
understood as signaling an assertion. Because of the game's induced bias with 
regards to information status of the two sets of referents, as proposed above, there 
is some incentive to signal the additional discourse commitment about the identity 
of one of the items pictured on the cards as an assertion: this is the more strongly 
contested set of referents and specifying a member of it has higher informational 
value. As mentioned above, naming one of them requires correct memory retrieval 
both about their identity (WHICH ITEMS ARE IN THE SET?) and their position (THE 
ITEM IS WHERE?), while specifying the location only requires the latter; realizing 
falling intonation on the initial item move would then signal this discourse com- 
mitment clearly. Somewhat paradoxically, this might also go some length to explain 
the “uncertain” rises on item moves by ZR29&HA30, and why they do not occur on 
the location moves: the speakers might not want to commit to making the inher- 
ently stronger assertion about both item identity and location with confidence, but 
risk less by signaling certainty about just the location at which they think an item 
is. OA32 and initially XU31 as well then could be seen to follow one of two strate- 
gies which pattern b) forces a speaker to choose between: signaling both discourse 
commitments separately, while XU31 from sequence six onwards chooses instead 
to forgo this in exchange for highlighting the coherent nature of the entire move 
sequence. A third attempt at explanation could be a purely phonological one: 0A32 
uses *continuation rises" only within, but not between utterances, thus optimizing 
the prosodic unit *utterance". I do not propose that either of these explanations is 
true at the exclusion of the others. In any case, such a claim could not reasonably 
be made on the basis of the evidence available here. Instead, I would suggest that 
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these are constraints (from different linguistic domains) that all act on the prosodic 
realization of a move, and that the heterogeneity in observable outcomes between 
speakers indicates both their violability and that they can be differently weighted. 

In sum, the evidence considered here suggests that rising contours are most 
broadly associated with something describable as *openness" or *incompleteness". 
This includes intention to continue a turn or a topic, an indication that a coher- 
ent unit has not yet reached its end, or the absence of a discourse commitment 
(interpretable as uncertainty, which presumably also lends itself to the use in some 
question types). Falling contours, conversely, are associated broadly with “finality” 
or *completeness", which encompasses closing a turn or a topic, or the presence of 
a discourse commitment (also interpretable as certainty). The contour forms are 
clearly not very strictly tied to any of these individual meanings, with only distri- 
butional tendencies discernible and context used for specification. Specifically, as 
seen, despite lacking commitment to the truth of a proposition, questions are reg- 
ularly formed with falling contours, and speaker attitude also seems to have some 
leeway in selecting which meaning a contour can be used to cue in a given context. 


6.2.3 Distribution of contours in nominal sequences 


The preceding section looked at the distribution of rising and falling contours 
across larger units, i.e. conversational moves and the utterances they are realized 
by. This is evidence for phrasing and a functional differentiation between rises and 
falls at a high level of prosodic structure. In the following, we will focus on phrasing 
within single utterances, specifically in sequences composed of nominal elements. 


6.2.3.1 Noun phrases and other nominal sequences 

This section takes a closer look at how contours are distributed within utterances and 
the conclusions this leads to about (default) prominence differences between several 
words in a sequence. The subset of data used here is that of all noun sequences in the 
nine sub-corpora (seven Concs, one Maptask and one Cuento) consisting of at least 
two nominal forms realized together without breaks and that could form a single 
nominal constituent (noun phrase) together syntactically. Whether they do form a 
noun phrase is only in doubt for a smaller subset, those consisting of a demonstra- 
tive (kay / tsay) or a bare noun, plus another noun. The demonstratives can be used 
both adnominally/attributively, and pronominally. Their form does not distinguish 
between these uses, just as with the English or Spanish demonstrative forms this/that 
or este/ese. If the demonstrative is adnominal and attributive, it forms a constituent 
with the noun as head, modifying the noun ((147)a). The same sequence of words can 
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be a copular phrase, with the copula omitted in the 3"! person singular present, if the 
demonstrative is used as a pronoun. The demonstrative and the noun each form their 
own noun phrase then ((147)b). The same ambiguity exists for combinations of a bare 
(unsuffixed) noun plus another noun either unmarked for case or for locative case 
(-chaw), since a bare noun can either modify the following noun ((148)a, cf. (149)a) or 
also stand with it in a copular relation ((148)b). 


(147) ambiguous use of demonstratives 

a. adnominal/attributive use (cf. (114)/Figure 132) 
[kay/tsay wanupakush]yp 
DEM.PROX/DIST burial 
“this/that burial” 

b. pronominal use (cf. (145)/Figure 170) 
[kay/tsay] yp [wanupakush] yp 
DEM.PROX/DIST burial 
“this/that is [the/a] burial” 


(148) ambiguous N,-N, combinations 

a. N,as modifier of N, 
[manka chawpi-chaw-mi]yp 
pot middle-LOC-ASS 
“in the middle of the pot” 

b. N,as copular subject 
[manka]yp [chawpi-chaw-mi]yp 
pot middle-LOC-ASS 
“the pot [is] in the middle” 


In the data considered here, context nearly always distinguishes between these 
uses and the annotation made use of that information. If used attributively, the 
demonstratives are ususally realized as part of the initial low stretch of a contour, 
phrased together with the noun they modify (cf. (110)/Figure 127, (106)/Figure 123, 
(114)/Figure 132). Yet if a demonstrative has the function of pronominal subject ina 
copular sentence, it can also be phrased separately, perhaps because it is then often 
also a topic, e.g. in (145)/Figure 170. The need to cue the syntactic or information 
structure interacts with the constraint against tonal crowding here: there is only a 
high tonal target realized on tsay, but the subsequent low target makes it clear that a 
new phrase begins with wanupakush. Thus it seems that phrasing can disambiguate 
structure. All other instances of nominal sequences considered here I take to form 
a nominal constituent, of varying length and complexity. Some examples (not just 
from Conc) are given in (149): 
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rout) 


| 
| tsay waeegukush 


DEM.DIST burial 


that's thc burial 


Figure 170: KP04 Conc Q 065077 (declarative; copular sentence with pronominal subject realized as 
two phrases). 


(149) Examples of nominal constituents in the Quechua corpora 

a. ZZ24 Cuent Q 1783 
hacha hampi-ta 
plant  medicine-OBJ 
*plant medicine" 

b. XQ33 Conc Q 0543 
chugllu-pa aqtsallku-n 
corn-GEN  corn.hair-3 
“corn hair” 

c. QZ13 Conc Q 1358 
runa wahi 
person house 
“person [and] house” 

d. OA32 Conc Q 1832 
runa-wan ` wayi 
person-INST house 
*person with/and 
house" 


375 https://osf.io/wsb8y/ 
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e. LC34_Conc_Q 1249 
runa hawa-n-chaw 
person below-3-LOC 
*below the person" 
f. KP04 MT Q 0703 
pinkullu-pa hana-n-pa 
flute-GEN — above-3-GEN 
*above the flute" 
g. SG15 Conc Q 0510 
kay laa  ka-q-chaw 
DEM.PROX side COP-AG-LOC 
“the one on this side" 
h. XQ33 Conc Q 1427 
runa  wayin-man  aywa-ykaa-q ladu-n-chaw 
person house-3-DEST go-PROG-AG  side-3-LOC 
“beside the person going into the house" 
i ZZ24 Cuent Q 1825 
huqta chuspita wafiu-tsi-nqa-n-pa alma-n-ta 
six fly-OBJ  die-CAUS-NMLZ-3-GEN soul-3-OBJ 
“the souls of the six flies he had killed" 


The examples demonstrate that what are counted here as a nominal constituents 
are rather varied: (149)a, c and e are examples of binominal phrases denoting a 
single entity where N1 modifies what kind of N2 it is, a conjunct of N1 and N2 in 
a coordination, and a location where N1 specifies relative to what object N2 is to 
be interpreted, respectively; all without internal morphological marking but with 
external (head-)marking in the cases of (149)a and e. (149)b, d, and f denote the same 
kind of semantic relations, but with internal marking (on N1) to specify the relation 
between N1 and N2. (149)g is an example for what might perhaps best be translated 
via a headless relative copular clause, or one using the pronominal dummy “one” 
in English: the demonstrative kay modifies laa ‘side’, which acts as subject of the 
copula ka-, marked with the agentive nominalizer -q and the locative to yield a 
single NP with the meaning “the one that is on this side".?"6 (149)h and i showcase 
particularly complex examples involving relativization (cf. note 355 for a discus- 
sion of ZZ24 Cuent Q 1825). Since this is not a work on syntax, this short exposition 
must suffice to point out the variety of nominal constituents here treated more or 
less uniformly. In what follows, I will occasionally make reference to the internal 


376 For an in-depth discussion of the uses of the nominalized copula, see Bendezú Araujo (2021). 
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variety of these noun phrases when it seems to affect prosody, but it can't be done 
full justice here. One unifying characteristic is that they are all right-headed, both 
syntactically and semantically, or can be analyzed thus without problems ((149)c 
and d are actually ambiguous in this regard but could well be right-headed). In fact, 
only very few constructions could not securely be analyzed as right-headed. They 
always involve either the instrumental/comitative marker —wan, such as (150), the 
morpheme -yuq, marking the possession of an entity denoted by a noun which — 
yuqis attached to as an attribute of another entity (151), and in one case a postnom- 
inal locative modification (152), in all the nominal constituents treated here. This 
general tendency for right-headedness will be relevant for the following analysis. 


(150) ZZ24 MT Q 1635 (cf. Figure 171) 
runa  ushqu fiawi-wan 
person cat eye-INST 
*person with cat eyes" 


(151) AZ23 Conc Q 0057 
achillku — chugllu-n-yuq 
corn.hair corn-3-POSS 
*corn hair with its corn" 


(152) AZ23 Conc Q 1734 
aya kahun-chaw 
corpse box-LOC 
*corpse in a box” 


6.2.3.2 Distribution of tonal patterns and alignment variant preferences 

This section only treats sequences consisting of nominal elements. This is done 
to somewhat reduce the heterogeneity of the data and improve its comparability 
when quantified. As will be seen in section 6.4, the broad results of this section 
can be extended to phrases with verbs, but verbs also introduce further interest- 
ing complications that would be confounding factors here. The subset of nominal 
sequences considered consists of 487 words. Morphologically, their word forms are 
made up of 1225 syllables, for an average syllable count of 2.52 syllables per word. 
However, in their actual realizations, there were several (especially but not always 
word-final) cases of vowel devoicing and/or deletion and also some cases of conso- 
nant assimilation or deletion, sometimes severely altering the resulting word forms 
so that a different syllabification seemed more appropriate. A revised syllable 
count aiming to account for these reduction phenomena comes to 1179 syllables, or 
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an average word length of 2.42 syllables per word. In the following, syllable counts 
will be given with a slash between the morphological syllables first and the actually 
realized ones second. Ofthe 487 words, 30 were produced either almost completely 
flat or with such substantial voiceless stretches that no confident assessment of 
their tonal behaviour could be made. The remaining 457 words were produced in 
what was analyzed as 292 separate phrases (see below for criteria), yielding a ratio 
of 1.57 words and 3.95 / 3.82 syllables per phrase. This compares well with similar 
ratios given for French and Seoul Korean in Jun & Fougeron (2000: 217, 2002: 150), 
where 2.3-2.6 words / 1.2 content words and 3.5-3.9 syllables per Accentual Phrase 
were counted for French, and 1.2 content words and 3.2 syllables for Seoul Korean, 
based on reading tasks. Compared with French, Quechua probably has fewer func- 
tion word types, since a lot of the functions they are used for in French are fulfilled 
by suffixes. However, those words that can be counted as separate function words 
do occur with high token frequency. Refining the comparison, all instances of the 
demonstratives kay and tsay, nominalized forms of the copula ka- (mostly ka-q and 
ka-ykaa-q), and the numeral huk “one”, occasionally used similarly to an indefinite 
marker, were counted as function words, even if suffixed.?" All other occurring 
words are content words. Nominalized tokens of ka- occur 36 times, kay 19, tsay 
30 and huk 5 times, for 90 function words out of 487 in total. Out of those, 5 tokens 
were realized in phrases that were discounted because of flat or otherwise indeter- 
minable realization. Applying these changes yields a content word ratio per phrase 
of (457-85)/292 = 1.27, rather similar to those given for French or Korean.??? This 
would suggest that what has here been counted as a phonological phrase is similar 
in its extent to what has been identified as an Accentual Phrase (AP) in those lan- 
guages. However, Igarashi (2014) makes the case for Japanese that dialects differ in 
whether an AP is allowed to include more than one word or not. Thus, APs would 
not have to include the same average number of words across different languages. 


377 These are function words semantically, because they have more of a grammatical function 
than a lexical meaning. However, morphosyntactically they are nouns and verbs like any word 
belonging to those classes: the demonstratives kay and tsay and the numeral huk can be followed 
by all nominal suffixes, and the possibility that they modify another noun by being placed in front 
of it without any suffixes is also available for semantically richer, more content-like nouns as well 
(cf. (149)a), although there are probably restrictions on such a use. The verb of being ka- likewise 
takes the same verbal suffixes and has the same morphosyntactic behaviour as any other verb, 
except that it is normally unrealized in the third person present when serving a copular function 
(not when it functions as a verb of existence or location). Looking at actual usage, the counts here 
suggest that these “function words" take up a sizeable share of token frequencies, and my impres- 
sion is also that the five word forms given here will occur without suffixes, as monosyllables, more 
often than “content” words. 

378 Cf. also the ratio of 1.12 content words per pitch accent for Huari Spanish in section 5.1.1.2. 
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The criteria used to group a stretch of Quechua speech into phrases were the 
following: in general, a phrase had to conform to one of the tonal contour shapes 
established in section 6.1. That is to say, with the exception of phrases including 
localized pitch accents on stress positions in words of Spanish origin (the “grafted” 
pattern), a phrase consists of exactly one high tone, possibly forming a plateau-like 
realization but not discontinuously so, either preceded by a low tone (rising 
pattern), or followed by it (only-falling pattern), or both followed and preceded by 
one (rise-falling pattern). Such a pattern had to be produced in a continuous, fluent 
fashion in order to be counted. Word sequences were excluded when they were 
completely flat or the pitch signal was too discontinuous, e.g. because of longer 
hesitations, or because of devoicing of phonologically voiced segments, but not just 
because of the presence of unvoiced consonants, if the overall pattern could still be 
recognized with some confidence. Since the data are semi-spontaneous, and Huari 
Quechua has few voiced consonants, stronger exclusion criteria would have been 
too restrictive. Within the three main categories of rises, only-falls and rise-falls, 
phrases were further distinguished according to the position of the tonal transi- 
tion(s): in the case of the rises and only-falls, whether it (the rise or the fall, respec- 
tively) took place at the final internal word boundary in the phrase, a prefinal word 
boundary, the penult of the final, or a prefinal word, or another syllabic position in 
the final word or a prefinal word, for six subcategories each. The defining criterion 
for the location of the transition was the boundary of the high tonal target, i.e. the 
left boundary in the case of rises, and the right boundary in the case of falls. Thus 
the position of a tonal transition was classified according to where the local pitch 
maximum was determined to be located within the context of a rise or fall. This was 
done by visual and auditory inspection of the annotated utterance in praat. In the 
case of the rise-falls, both the rise and the fall could in principle take place at either 
a final or prefinal word boundary, or at the penult or another syllabic position in 
either a prefinal or the final word. Here I refer only to phrase-internal boundaries 
as word boundaries, so “the final word boundary" is that between the penultimate 
and the final word, and a *prefinal word boundary" is a boundary between any 
two words inside the phrase not involving the final word. Out of all the possible 
combinations, the following eleven subcategories were initially formed to class the 
phrases: 


(153) Alignment subcategories of the rise-falling contours 
a) with the rise taking place at the final word boundary and the fall in the 
penult ofthe final word, 
b) with the rise taking place at a prefinal word boundary . . . and the fall at 
the final word boundary, 
C) ...andthe fall in the penult of the final word, 
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d) ...and the fall later in the penult of a prefinal word, 

e) with the rise taking place in the penult of a prefinal word . . .and the fall 
at the final word boundary, 

f) ...and the fall in the penult of the final word 

g) ...andthe fall in the penult of a later prefinal word, 

h) both the rise and the fall on the last two syllables of the final word, 

i) both the rise and the fall on the last two syllables of a prefinal word, 

j another combination in which the high stretch covers most of the final 
word, 

k) another combination in which the high stretch does not cover most of 
the final word. 


It turned out that combinations (153) c), d), f), and j) do not occur in the corpus, 
which is why they are not listed in Table 32 below. Combination e) possibly occurred 
once, but that token is ambiguous between the rise taking place on the penult and 
at the initial word boundary of the prefinal word, because that word is bisyllabic. 
It has therefore been counted as an instance of combination b), and e) also does 
not occur in the table. Combinations h) and i) lump the syllabic positions of penult 
and final syllable together. This was done in order to reduce the number of combi- 
nations, and because we have seen that final peak alignment varies between these 
two positions in this contour (cf. section 6.1.6). 

In this manner, using breaks only as additional boundary cues, it was possible 
to manually delimit and classify nearly all phrases unambiguously. Occasionally, 
it was difficult to decide whether several words in succession formed one or more 
phrases, with pitch movement possibly attributable to phrase tones on some or 
all of the words, but scaled much larger on the final one. Those cases had to be 
decided between constituting phrasal tonal movement or just random fluctuation. 
They were individually decided based on whether the prefinal movement was 
audibly percebtible, whether each identifiable tonal target plausibly corresponded 
to one of the assumed underlying tones of a phrase, how the scale of the prefinal 
movement compared relative to the final movement given the speaker's range, and 
whether it surpassed a threshold of 7 Hz difference. An example of such a case is 
Figure 171 (cf. (150)): on the two prefinal words runa and ushqu, rising pitch move- 
ments have a range from local minimum in the first syllable to local maximum in 
the second at 136-146 Hz and 133-142 Hz, respectively. These ranges are clearly 
smaller than that on the final word, with 134-159 Hz from minimum to maximum. 
Here the the prefinal rises could be audibly perceived (although weakly), and not 
only three peaks, but also three low tonal targets, could be made out, correspond- 
ing to a sequence LH LH LH of three rising contours. Therefore, this example was 
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counted as consisting of three rising phrases, even though the pitch difference on 
the first two words is only just above the 7 Hz threshold. 


red aam — lia IN TM 


| bibi T 
ian ý ^ M | M ^ jl | 


Figure 171: Z724_Cuent_Q_1635.3” 


In doubt, the tendency in such cases was to decide them in favour of a grouping 
involving more phrases rather than fewer?*? Another ambiguous situation for anal- 
ysis arose whenever a word with a rising contour on a syllabic position was directly 
followed by another word with an only-falling contour on a syllabic position. This 
could possibly also be counted as a single phrase with a rise-falling contour. Recall 
section 6.2.2 where in similar cases, when counting those as two separate units, 
location moves could nearly always be separated from item moves. Here, unless 
the pitch transition between the two words was completely smooth and there was 
absolutely no interruption whatsoever, this was counted as constituting a phrase 
break, instead of one rise-fall phrase, which contributes to why subcategory (153) 
f) has no counts. 

Using these analytical criteria, the dataset was found to contain the following 
distribution of phrases, in rising (Table 30), only-falling (Table 31), and rising-falling 
(Table 32) contours. 


379 https://osf.io/4aqhd/ 
380 In section 6.4, it will be argued that such phrases with systematic scaling differences form 
larger prosodic units together. 
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Table 30: Rising contour patterns in the Quechua nominal sequences subset, in all phrases (rows 2-4) 


and in phrases containing more than one word (rows 5-7). 


tonal final prefinal final prefinal final prefinal ` total 
transition word word penult penult other other 
location boundary boundary syllable syllable 
counts 26 8 63 12 40 4 153 
T 96 of all 17 5.2 41.2 7.8 26.1 2.6 100 
5 rises 
7 % of all 8.9 2.7 21.6 4.1 13.7 1.4 52.4 
phrases 
counts 26 8 28 12 12 90 
% of all 28.9 8.9 31.1 13.3 13.3 44 100 
E - >1-word 
E = rises 
EX 96 of all 17.2 5.3 18.5 8 8 24 59.6 
>1-word 
phrases 


Table 31: Only-falling contour patterns in the Quechua nominal sequences subset, in all phrases (rows 
2-4) and in phrases containing more than one word (rows 5-7). 


tonal final prefinal final prefinal final prefinal ` total 
transition word word penult penult other other 
location boundary boundary syllable syllable 
counts 4 1 33 0 12 T 51 
$ % of all 7.8 2 64.7 0 23.5 2 100 
H only-falls 
"s 96 of all 1.4 0.3 11.3 0 4.1 0.3 17.5 
phrases 
counts 4 1 9 0 5 1 20 
% of all 20 5 45 25 5 100 
5 4 >1-word 
E g only-falls 
£ * %ofall 2.7 0.7 6 0 3:3 0.7 13.3 
>1-word 
phrases 


381 This single count comes from a case with “inherited” pattern on the Spanish word último “the 


last”. 
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In the rising contours, three of the four tokens in the least frequent category, with 
the rise taking place on a syllable other than the penult in a prefinal word (column 
8 in Table 30), are cases of “inherited” patterns on oxytonic Spanish loanwords, 
with the Spanish stress position causing the rise to take place on the final syllable of 
the prefinal word. In the fourth token, the rise takes place gradually over the whole 
word, which seems to be an infrequent phonetic variant. Hence a contour with the 
rise in a prefinal word taking place on a syllable other than the penult seems una- 
vailable without a lexically specified prominent syllable like in the Spanish loan- 
words (cf. the OT-analysis in section 6.3.2). In the rise-falling contours, no tokens 
were found where the rise took place at a word boundary, but the fall at a syllabic 
position, or vice versa, except for combination a), where the rise takes place at the 
final phrase-internal word boundary and the fall on the penult of the final word. 
This latter combination is a special case deriving from phrase-final tonal crowding 
(cf. the OT-analysis in section 6.3.1). 


Table 32: Rising-falling contour patterns in the Quechua nominal sequences subset, in all phrases 
(rows 2-4) and in phrases containing more than one word (rows 5-7). 


tonal transition a) b) g) h) i) k) total 
locations (cf. (153)) 
Ý counts 12 6 2 56 8 4 88 
H percent of all rise-falls 13.6 6.8 2.3 63.6 9.1 4.6 100 
= percent of all phrases 4.1 2.1 0.7 19.2 2.7 1.4 30.1 
counts 12 6 2 10 8 3 41 
E .4 percent of all >1-word 29.3 14.6 4.9 24.4 19.5 7.3 100 
q S rise-falls 
& A percent of all >1-word 8 4 1.3 6.6 5.3 2 27.2 
phrases 


A first result from all three tables taken together is that rises are slightly more fre- 
quent (n = 153; 52.4%) than only-falls and rise-falls taken together (n = 139; 47.6%), 
but rises and falls broadly each make up about half of all tokens. Recall that in 
section 6.2.2, falling moves turned out to be in the majority, i.e. final phrases in indi- 
vidual moves. That rising phrases are relatively more frequent at this lower level 
of observation here is compatible with the hypothesis that they cue incomplete- 
ness, with moves in which the final phrase is falling also preferentially contain- 
ing rising phrases prefinally. Secondly, patterns making use of a word boundary as 
tonal transition points are in a minority (34 tokens/22.22% of rises, 5 tokens/9.896 
of only-falls, 18 tokens/20.45% (categories a) and b)) of rise-falls) compared to those 
using a syllabic position, overall. However, these counts include many phrases con- 
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sisting of only a single word, where word boundaries cannot be the tonal transition 
locus by definition. In multi-word phrases (lines 5-7 each of Tables 30—32), pat- 
terns making reference to a word boundary make up a more sizable share, namely 
37.8% of rises, 25% of only-falls and 43.9% of rise-falls. Yet while they are certainly 
arelevant variant for multi-word phrases, word boundary-patterns are also clearly 
not in complementary distribution with patterns making reference to a syllabic 
position, with the former used in multi-word phrases, and the latter only for sin- 
gle-word phrases. Instead, patterns making reference to a syllabic position are the 
majority variant in all phrases. 


Table 33: Distribution of phrase patterns across the nominal sequences subset Quechua, ordered 
according to subcorpus, phrase type (rising, only-falling, rise-falling), tonal transition location (all 
(=syllabic position + word boundary) vs word boundary), and phrase length (all vs. only multi-word 
phrases). 


zo o D e ES = = = 
8 = 3 S: a Ë £ 
ud S = S = S = = [i = = 
S 85 $ 5 $ + Ë a Ë 
8 8 Í 3 å g & 
z z s 2 g 
c c c [zu em 
TP03&KP04 30 4 15 1 20 8 65 13.3 67 40 20 
MT 
TP03&KP04 17 10 1 0 7 4 25 588 0 57.1 56 
Conc 
SG15&QF16 9 1 7 1 7 0 23 11.1 14.3 0 8.7 
Conc 
QZ13&0Z14 16 2 3 1 7 0 26 125 33.3 0 11.5 
$ Conc 
E XU31&0A32 13 0 4 0 12 0 29 0 0 0 0 
= Conc 
= ZR29&HA30 12 5 14 2 8 0 34 417 143 0 20.6 
Conc 
XQ33&LC34 16 5 2 0 0 0 18 313 0 -- 27.8 
Conc 
AZ23&ZZ24 18 5 2 0 7 0 27 27.8 0 0 18.5 
Conc 
AZ23&ZZ24 22 2 3 0 20 1 45 9.1 0 5 6.7 


Cuent 
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Table 33 (continued) 


2 Q py e = = ES = 
8 E id a ees) - 
EE. Í $ d» § RE 5 > 5 > 
S 5 S&S 5 $ S S$ p Ë 
S S S: a = A 
z s s S 
c c c © A 
TP03&KP04 11 4 7 1 15 8 33 364 143 53.3 394 
MT 
TPO3&KP04 13 10 0 0 6 4 19 76.9 -- 66.7 73.7 
Conc 
E SG15&QF16 7 1 2 1 5 0 14 143 50 0 14.3 
s Conc 
> QZ138&0Z14 9 2 2 1 1 0 12 22.2 50 0 25 
D: Conc 
H XU31&0A32 4 0 3 0 3 0 10 0 0 0 0 
E Conc 
5 ZR29&HA30 8 5 5 2 6 0 19 62.5 40 0 36.8 
5 Conc 
= XQ33&LC34 12 5 0 0 0 0 12 417 -- -- 41.7 
5 Conc 
AZ23&Z224 16 5 1 0 0 0 17 313 0 -- 29.4 
Conc 
AZ23&Z724 10 2 0 0 5 1 15 20 -- 20 20 
Cuent 


Individual speaker preference also seems to play a role in the variation between 
word boundary and syllabic position as tonal alignment anchor: speaker pair XU31 
& OA32 do not produce word-boundary-aligned phrases at all in the 26 nominal 
phrases of their Conc corpus, and speaker pair AZ23 & ZZ24 produce only 7 word 
boundary-aligned phrases in the 72 nominal phrases of their Conc and Cuento 
corpora together, or 9.7%. On the other end, TP03 & KP04 produce 29 (including 
11 of the 12 tokens of pattern a) of the rise-fall phrases overall) of them in the 90 
nominal phrases (52 of them multi-word) of their Conc and MT corpora together, 
amounting to 32.2% of all their phrases, and 55.8% of their multi-word phrases. 
Table 33 shows that this difference persists also if only the Conc corpora are com- 
pared, suggesting that an explanation in terms of different functions is less likely. 
I also could not find evidence that the distinction between locating the tonal tran- 
sition at a word boundary or a syllabic position is functional in a careful qualita- 
tive investigation of all the Huari Quechua data in their contexts. The most likely 
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hypothesis is therefore that the difference is one of individual speaker preference, 
ie. a prosodic feature that is socioindexically meaningful, part of a speaker's style. 


6.2.3.3 Phrasal prominence and relation to information structure 
and information status 

This section investigates a feature of multi-word phrases that does seem to serve 
a functional purpose. The argument here is based on the assumption from metri- 
cal phonology that metrically assigned relative prominence extends to all levels 
of the prosodic structure, at each level (n) marking exactly one (the head) of the 
subordinate metrical constituents contained within it at the level below (n-1) as 
strong (s), and all others as weak (w, cf. section 3.3.1). I argue that this metrical 
structure also exists in Huari Quechua, and that the location of the high tone and 
the stretch (plateau) extended from it cue prominence at the phrase level?9? I 
follow Ladd (2008: 223, 252), Calhoun (2010b), Jun (2014b), Féry (2017) in assuming 
that there is a default or unmarked prominence pattern, specific to languages and 
constructions,?*? which determines the location of the strong position in a phrase. 
This default prominence is compatible with a range of contexts, specifically at least 
those contexts where the focus domain as determined from the QUD is as broad 
as the phrase itself. Such an unmarked default is argued to be crucial for assign- 
ing a prominence structure based on expectation even in the (relative) absence 
of phonetic cues (Ladd 2008: 257—259, 271; Calhoun 2010b). My hypothesis is that 
in the subset of the Quechua data investigated here, on nominal sequences also 
syntactically right-headed in the majority, default prominence in a phrase is right- 


382 It is usually assumed that high pitch, as the result of high tones, is more prominent than low 
pitch resulting from low tones, ceteris paribus. This has been shown in perception experiments 
and is part of the predictions of the Effort Code (Gussenhoven 2002a: 50, 2004: 85). Note that pitch 
height here is relational, never absolute, and relative pitch excursion size is therefore the most 
important cue to prominence. The Effort Code is based on the association of pitch movement with 
articulatory effort, so that passages with more pitch variation are perceived as more effortful, and 
in turn as more prominent. That is why I assume that in the rising and only-falling contours it is 
the word on which the tonal transition takes place that is most prominent (at the same time, the 
Effort Code also predicts rising contours to be more prominent than falling ones). Note however 
that language-specific assocations of pitch and information structure can sometimes go against 
this general trend: In Akan, Kügler & Genzel (2012) found that corrective focus is associated with 
relatively lower pitch. 

383 It seems that languages can differ both in where main prominence falls within the same type 
of construction, and in whether it changes across constructions in the same language. In polar 
questions, English places prominence on the final lexical noun and on the verb only if no noun 
is available, while Russian always places it on the verb. In Russian, statements differ from polar 
questions in that main prominence goes instead to the final lexical noun (cf. Ladd 2008: 224-225). 


6.2 Phrasing and relation to meaning-based categories —— 413 


most (as in Spanish and many other languages, cf. Ladd 2008: 252)?** the last word 
being strong (s), and the others weak (w). The findings in this section will also shed 
light on whether for the data here, a similarly indirect and probabilistic relation- 
ship between prosodic cues, metrical structure and information structure can be 
assumed as the discussion in section 3.7.3 has suggested holds for different varie- 
ties of Spanish as well as Cuzco Quechua. 

The assumption of phrasal prominence and of the relation between high tones 
and prominence is used here to generalize across the various identified phrasal 
contours. For the nominal sequences subset of the data, I assume that the relation 
between contours and metrical structures is the following, with phrasal promi- 
nence either final (on the final word in the phrase) or prefinal (on a prefinal word): 
- nrising contours: those contours with the rising tonal transition either at the 

final boundary between words, or on a syllabic position in the final word cue 

final prominence. All others cue prefinal prominence. 

- nrising-falling contours: those contours with the rising tonal transition taking 
place either at the final word boundary or at a syllabic position within the final 
word, i.e. where the high tone or plateau is entirely within the final word, cue 
final prominence. Contours where the final word is entirely excluded from the 
high plateau cue prefinal prominence. Those where a plateau begins before the 
final word but extends into it are counted as indetermined. 

- n only-falling contours: only those with the falling tonal transition in the 
penult or the final syllable of the final word cue final prominence, all others 
cue prefinal prominence. 


384 But not all, such as Japanese (cf. Jun 1993, 2005b; Venditti et al. 1996; Venditti et al. 2008). Ven- 
ditti et al. (2008: 466, 477, 480—481) are at pains to emphasize that there is no structural or surface 
equivalent to a ,nuclear* or ,sentence accent* in Japanese; pitch accent tones are only assigned 
lexically and never postlexically. Prominence relationships or focus are signaled by pitch range 
expansion and compression and phrasing only. Mostly the same applies to (Seoul) Korean, except 
that no lexical accents exist either. However, it is still the case that culminative tonal marking at the 
default phrase edge is compatible both with a constituent positioned at that edge being narrowly 
at-issue and a QUD as broad as the phrase (Venditti et al. 2008: 481), just as in Spanish or English, 
except that the default phrase edge is to the left in Japanese, instead of the right. It seems plausible 
to take this edge-inversion as an effect of the inversion in the default metrical relation (s-w instead 
of w-s) between constituents, and thus to say that the differences only lie in the cues and this 
basic direction in the metrical structure, which nonetheless underlies all four languages, as Ladd 
(2008: 278—279) does. Note that this account of Japanese focus marking has been put in question 
by Ishihara (2011, 2016, 2017), who suggests that metrical structure in Japanese does not observe 
culminativity, and that phrasing is instead determined by syntax. 
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In addition, when a noun sequence is realized on several phrases, the entire sequence 
is counted as cueing final prominence only if the phrases are all rising or rise-falling 
and the final one clearly has the largest pitch span (meaning that sequences like 
(150), with a pitch contour as seen in Figure 171, will be counted as having final 
prominence). This implies that I take such phrases (PhPs) to form a larger unit 
together (either an IP or a recursively iterated larger PhP), in which the same default 
prominence relation (w-s) holds. If the phrases are of the same type but the final 
one does not have the largest pitch expansion, the sequence is counted as cueing 
prefinal prominence. All other combinations of phrases in a sequence are counted 
as indetermined. 

Classing the nominal sequences dataset in this way means taking as basic unit 
not an individual phrase but an individual sequence (often but not always corre- 
sponding to a syntactic noun phrase). This was done in order to be able to say some- 
thing about the relationship of words in a sequence where words are individually 
phrased. The absolute counts are thus different from the ones in Tables 30-33. 

Table 34 gives counts for the prominence patterns (final, prefinal, indeter- 
mined) per nominal sequence classed according to information structure within 
the noun sequence. The information structural annotation the classification is 
based on was done on the entire Quechua dataset. That is to say, even though only 
the nominal sequences are considered here, for each corpus the entire discourse 
was analyzed and the annotation for the nominal sequences results from this. The 
nominal sequences were categorised according to which information structural 
function they most likely played based on an analysis of the discourse context, 
using the QUD model of discourse as well as the context model of Farkas & Bruce 
(2010) as analytical guidance, as laid out in section 3.7. The categories are thus 
explicitly not based on formal criteria like morphosyntactic or prosodic form. 
Such a contextual analysis has its limits, because an implicit QUD cannot always 
be determined exactly in actual conversation. When context did not allow for a 
precise determination of the implicit QUD in order e.g. to decide between a more 
or less narrow focus domain, the implicit QUD making less contextual assump- 
tions (i.e. in general a broader focus) was chosen. The categories in the table are 
to be understood in the following: Broad focus means that the context was judged 
to most likely imply an (implicit, rarely explicit) QUD that does not impose a 
focus-background division within the nominal sequence itself, while the sequence 
as a whole is at-issue with respect to the current QUD. Topic (broad) also means 
that the context was judged to most likely imply an (implicit, rarely explicit) QUD 
that does not impose a focus-background division within the nominal sequence. 
Here however, the sequence as a whole is not at-issue with respect to the current 
QUD (i.e., backgrounded), but instead topical, i.e. it serves as a referential or pred- 
icational anchor or restriction for the current at-issue material relating it to the 
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discourse progression via relevance. Topic-comm means that the context made it 
plausible to infer a division within the nominal sequence in terms of at-issueness, 
such that the earlier part*® is a topic (backgrounded with respect to the current 
QUD but establishing a relation of relevance with regards to a preceding QUD) 
and the later at-issue with respect to the current QUD. Those cases where context 
allowed to decide ambiguous noun sequences as copular (cf. (147), (148)) make 
up the majority of this category, with the reasoning that the possibility to freely 
omit subjects in Quechua if they are already active in the discourse (pro-drop) 
conversely makes it likely for them to be topical if not omitted (cf. Lambrecht 1994: 
137). Prefinal focus (narrow) also means that context made it plausible to infer a 
division within the nominal sequence in terms of at-issueness, such that a prefinal 
constituent could be taken to be at-issue, with the domain of at-issueness mini- 
mally excluding the final word. Final focus (narrow) is the complement to prefi- 
nal focus (narrow), in that the domain of at-issueness here could be understood to 
cover minimally the final word and to exclude at least one prefinal word. Prefinal 
focus (corr) and Final focus (corr) cover the same context-inferred separations 
of the nominal sequence as prefinal (narrow) and final (narrow), respectively, 
except that it was also possible to make out a salient alternative, uttered in the 
preceding context, to which the focal constituent is a correction (i.e. as a divergent 
answer to the same QUD). 


Table 34: Prominence patterns in the noun sequences dataset of Quechua according to information 
structure relation within the noun sequences. 


information structure within the noun sequence 


broad topic topic-  prefinal final prefinal final ` total 

focus (broad) comm focus focus focus focus 

(narrow) (narrow) (corr) (corr) 
g final 50 39 1 5 9 0 1 105 
©  prefinal 19 7 1 16 0 1 0 44 
E indeterminate 40 22 8 2 1 0 0 73 
& total 109 68 10 23 10 1 1 222 


Table 35 classes the same data according to the information status profile of the 
nouns in the sequence. The categories used in the table are informed mainly by the 
classification in Baumann & Riester (2012), who themselves incorporate insights 


385 Categories do not specify the location between which words in a sequence the division takes 
place, only the order. The 487 words occurred in 222 nominal sequences consisting of at least 2 and 
on average 2.19 words. 
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Table 35: Prominence patterns in the noun sequences dataset of Quechua according 
to information status (givenness/newness) within the noun sequences. 


Information status within the noun sequence 


allnew  allgiven  given-new ` new-given ` total 


9 final 40 33 20 12 105 
S prefinal 28 6 3 7 44 
E indeterminate 41 10 19 3 73 
& total 109 49 42 22 222 


from several preceding works, such as Chafe (1976, 1994); Gundel et al. (1993); 
Prince (1981, 1992). However, several distinctions made in those works have here 
been classed on either side of a binary division, given or new. This was done as an 
adaptation to the type of data as well as in order to reduce the number of categories 
to a manageable size, so that the main question regarding a relationship between 
prominence and information status within the nominal sequence could be inves- 
tigated. The categories were created using the following annotation schemes: if a 
referent occurred for the first time in the discourse (the respective task corpus), 
the referring expression it was denoted by was annotated as new; if it occurred 
repeatedly, as given. This corresponds mostly to the concept of referential given- 
ness of Baumann & Riester (2012), but disregards the several more fine-grained 
distinctions they make within the broader category of referential givenness. In 
Conc and MT, the “new” here corresponds most closely to their r-unused-known 
(*discourse-new item which is generally known", Baumann & Riester 2012: 138), or 
r-environment (discourse-new items that refer to *visible objects in the communica- 
tive environment which are not available in the speech setting by default", cf. 139), 
because the participants were able to familiarize themselves with all the referents 
(the figures on the cards) before playing (Conc) or had their images in front of them 
(MT) during the game. In Cuento, what is *new" here can occasionally also be their 
r-new (*specific or existential indefinite introducing a new referent", Baumann & 
Riester 2012: 138), with the difference that there is no formal marker on nouns 
for definiteness in Quechua. Annotating only according to referential givenness 
means that if two expressions are not coreferential, an expression that is formally 
partially the same as another having occurred earlier will not be counted as given 
(e.g. kantu-chaw in first hana kantuchaw *at the upper border" and later washa 
kantuchaw *at the border over there"). Referential givenness is only applicable to 
referring expressions: often a nominal sequence in the data, if it consists of only 
one NP such as hana kantuchaw, is referentially given or new only as a whole, since 
only the noun together with its modification denotes a single identifiable referent 
in the discourse. The two components hana and kantuchaw do not refer to sepa- 
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rate referents individually.*** So far, this leaves out another aspect of givenness, 
called lexical givenness in Baumann & Riester (2012). Lexical givenness refers not 
to referents but only to expressions, even individual lexical items. An expression 
is lexically new if it occurs for the first time, and given if it, a synonym, a hyper- 
nym, or a holonym occurred previously in the discourse. There is a difference in 
temporal restriction from the treatment of referential givenness implemented by 
Baumann & Riester (2012: 144—145) and first postulated by van Deemter (1994): 
referring expressions are retained for a shorter while than their referents, and 
can thus become “new” again. An annotation according to lexical givenness allows 
us to annotate a givenness difference even within a nominal sequence that forms 
a single referring expression: thus, kantuchaw in washa kantuchaw is (lexically) 
given if hana kantuchaw occurred a short enough distance before. The distance 
chosen here was 5 utterances, which is similar to the 5 intonation phrases chosen 
by Baumann & Riester (2012), or else 25 seconds. Lexical givenness enters the data 
in Table 35 only as difference in givenness within a nominal sequence; i.e. some NPs 
that denote new referents might be partially made up of expressions that are given, 
like kantuchaw in washa kantuchaw in the situation just described. This is then 
counted as new-given (column 5 in Table 35). If the opposite order obtains (e.g., 
washa is lexically given because of a recent previous occurrence of washa ladu 
and then the referentially new NP washa kantuchaw is introduced), it is classed as 
given-new (column 4). The opposite situation, where a given referent is partially 
expressed via lexically new expressions, would have been treated in the same way 
but did not occur. In cases where lexical and referential givenness were completely 
opposed, a decision would have had to be made regarding which type of givenness 
would have been given precedence.??" However, this did not happen on the data 


386 Note that Quechua here works differently from the languages Baumann & Riester (2012) base 
their work on: they state that referential givenness can only apply at the level of the DP, which 
is formally different from an NP in languages like English or German. Quechua has no formal 
marking in this respect. On its own, kantuchaw or just kantu can be a full referring expression if 
context allows only a single interpretation (e.g. there just being one border) and no overt specifier 
such as hana is necessary. An analysis might assume that such a noun then has a covert specifier 
and is hence a DP, but this could only be argued from the broader context, i.e. when it allowed us 
to deduce the specificity of kantu in such a case. The DP assumption thus has to be argued some- 
what circularly. The point is that languages that like Quechua lack formal markers for definiteness 
and related nominal categories suggest that the relation between information status and nominal 
syntax is less clear-cut (“only DPs can be referring expressions") than it would seem from the per- 
spective of languages that do have them. 

387 When a given referent is wholly expressed via lexically new expressions, Baumann & Riester 
(2012: 146, 150) hypothesize but cannot demonstrate, using their German corpus study, that refer- 
ential givenness should trump lexical newness insofar as the resulting expression should still be 
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considered here. Modifying demonstratives and nominalized forms of the copula 
were annotated according to the referential givenness of the whole expression (cf. 
Baumann & Riester 2012: 143-144). 

One of the main findings in the two tables is that phrases with final promi- 
nence are overall in the majority, making up nearly half (47.396) of all observations. 
This in itself is some evidence that final prominence is indeed a kind of default at 
the level of the phrase, at least if we accept the assumed relation between high tone 
position and prominence that is the basis of the prominence classification here. 
The prevalence of this prominence pattern is also relevant if we recall the pattern 
called phrase accentuation in the analysis of the Spanish data (cf. section 5.1.3.1). 
The phrase accentuation of Spanish has also been shown to cue final prominence. 
In terms of contour shape, it is very similar to a rise-fall contour of Quechua in 
which the high tone is aligned with a syllabic position in the final word (column 
h in Table 32, accounting for about a quarter of rising-falling contours on mul- 
ti-word phrases). Equally, falling contours with prefinal prominence seem not only 
to resemble contours with postfocal deaccentuation in Spanish, but to also serve 
somewhat similar functions if the findings here are correct. My aim here is not 
to argue for one being the origin of the other, but to point out the structural and 


deaccented. The opposite case is also unclear. Baumann & Riester (2012: 147, 150) hypothesize that 
new referents expressed by given lexical items should also be deaccented, but again their German 
data does not confirm this. An example in van Deemter (1994: 5) is interesting in this regard: 


() Clinton visited many towns; when he finally arrived in Clinton, he was late. 


Here, the first *Clinton" refers to the former US president and the second to a town of the same 
name. In this case, despite the lexical givenness of the expression “Clinton”, its second occurrence 
is thought to be obligatorily accented (van Deemter 1994: 5). This stands in contrast to an example 
from Büring (2007: 448): 


(ii) A: Why do you study Italian? 
B: Pm married to an Italian. 


In (ii), the second occurrence of "Italian" is thought to be deaccented because of its lexical given- 
ness. While van Deemter (1994: 5) takes the accent on the second *Clinton" to be due to the differ- 
ence in denotation between the two Clintons but also suggests that focus or contrast might play a 
role, Baumann & Riester (2012: 136, footnote 13) propose that the information of there being a town 
called Clinton has a degree of *extra newness", overriding lexical givenness here. They provide a 
further example: 


(ii) Clinton shares his name with a town; when he finally arRIVED in Clinton, he was late. 


Here, the second occurrence is thought not to be accented anymore. All of this seems to indicate 
that referential newness can actually overrule lexical givenness on occasion, contra the hypothesis 
by Baumann & Riester (2012). 
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realizational similarities. It seems plausible that the bilingual speakers here take 
recourse to a core of cueing strategies available to them independent of the lan- 
guage they speak as part of the repertoire of their speech community (which does 
not preclude their also being typologically frequent), and which they further adapt 
according to other more divergent structural requirements. 

For the counts in both tables, y?-tests were done to check against the null 
hypothesis that rows (prominence location) and columns (information structure / 
status) are independent of each other. Fisher's exact tests were also done because 
the expected values in some cells in both tables were low enough to affect the reli- 
ability of the x*-test (some were below 5, none were below 1; cf. Field et al. 2012: 
816). For Table 34 (information structure), the two single counts for “final (corr)" 
and “prefinal (corr)” were reassigned to “final (narrow)” and *prefinal (narrow)". 
The results of the y?-test (x?= 63.557, df = 8, p = 9.3x10 1!) and Fisher's exact test (two- 
sided, p - 4.49x10?) are both highly significant. For Table 35 (information status), 
the results of y?-test (x? = 22.804, df = 6, p = 0.00086) and Fisher's exact test (two- 
sided, p = 0.0006) are also both highly significant. This suggests that columns and 
rows in both tables are not independent of each other, i.e. that both information 
structure and information status as annotated are associated with the position of 
prominence in the data. The adjusted standardized residuals for each cell reveal 
that only some counts contribute to the overall significance. For information struc- 
ture, among them are the counts for final prominence under narrow final focus (10 
counts including one from final (corr); z = 2.972, p < 0.01) and those for the inverse 
case, prefinal prominence under narrow prefinal at-issueness (17 counts including 
one from prefinal (corr); z = 6.638, p < 0.001, note that the expected value for this 
cell is just below 5 at 4.76, so the p-value is probably not fully reliable). This indi- 
cates that these respective information structural conditions both associate with 
the two different prominence patterns more frequently than expected under the 
null hypothesis of independence. 


Table 36: Adjusted standardized residuals from the y^-test of independence for 
prominence position according to information structure (cf. Table 34, the “corr” 
counts are here integrated into the “narrow” counts). Exclamation mark indicates 
a cell where the expected value is «5; *, **, and *** indicate a p-value of «0.05, 
«0.01, and «0.001, respectively. 


focus topic topic- prefinal final 

(broad) (broad) comm (narrow) (narrow) 
final prominence -0.418 | 1.994* -2.417!* | -2750** 2.972** 
prefinal prominence -0.877 -2.366*  -0.797! 6.638 (!) *** — 1.691! 


indeterminate 1188  -0.112 3.246**  -2,1** -1.723 ! 
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Of the complementary conditions final prominence under narrow pre-final at-is- 
sueness (z = -2.75, p < 0.01) and prefinal prominence under narrow final at-issue- 
ness (z = -1.691, p > 0.05), both with negative z-scores, only the former reaches sig- 
nificance, while the latter rather narrowly misses it (also with an expected value of 
only 2.2). Overall, this is some evidence for the position of a constituent being nar- 
rowly at-issue being reflected in the position of prominence, as expected: narrow 
final at-issueness associates positively with final prominence and narrow pre-final 
at-issueness positively associates with pre-final prominence (with the caveat of the 
expected values here being just below 5) as well as negatively with final promi- 
nence. The counts do not fully allow us to conclude that narrow final at-issueness 
also associates negatively with prefinal prominence, although they point in this 
direction. In the case of broad topics, but not of broad focus, there is a positive 
association with final prominence (z = 1.994, p< 0.05) and a negative one with prefi- 
nal prominence (z = -2.366, p < 0.05). This is at least some evidence in the expected 
direction, namely that a broad information structure on the noun sequence (no 
internal division) associates with final prominence. 


Table 37: Adjusted standardized residuals from the y?-test of independence for prominence 
position according to information status (cf.Table 35). Exclamation mark indicates a cell where 
the expected value is <5; *, **, and *** indicate a p-value of <0.05, <0.01, and <0.001, respectively. 


all new all given given-new new-given 
final prominence -3.107 ** 3.184 ** 0.046 0.717 
prefinal prominence 2.154 * -1.507 -2.289 * 1.487! 


indeterminate 1.474 -2.106 * 1.893 -2.025 * 


In the case of information status (Table 37), an unexpected result is that nominal 
sequences classed as completely new are significantly negatively associated with 
final prominence (z = -3.107, p < 0.01) and positively associated with prefinal prom- 
inence (z = 2.154, p < 0.05). Somewhat more expected is that being completely 
given is significantly positively associated with having final prominence (z - 3.184, 
p < 0.01), and that a sequence with a given-new partition is negatively associated 
with prefinal prominence (z = -2.289, p < 0.05). The latter two results might be taken 
as some evidence that prominence is final by default, i.e. if nothing else intervenes, 
and that relative newness in the sequence attracts prominence. For the surprising 
result about the all-new sequences, recall that the annotation used a threshold of 5 
utterances or 25 seconds distance (adapted from a similar threshold in Baumann & 
Riester 2012) between two mentions of a word for it to be once again classed as 
lexically new instead of given. This means that some instances of noun phrases 
like primera fila-chaw *in the first row" in Conc were counted as all-new because 
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the last mention of fila was beyond this limit. However, it is quite plausible that in 
particular such frequent words that were essential to how some of the speakers 
played the Conc game were actually active for longer, possibly having a kind of 
“default” givenness status in this task. This suggests that the threshold for re-class- 
ing an expression as lexically new after it had previously already been given should 
probably be set at a further distance than the one proposed in Baumann & Riester 
(2012), at least for such types of rather repetitive conversational tasks. It might 
also be worth considering in further resarch whether different lexical items can 
be taken to have different thresholds of this kind, depending on how central they 
are to the type of conversational task at hand. The following will shortly explore 
further possible factors. 

A partial explanation for the ambiguous findings regarding prosodic cues 
related to information status overall might be found when looking at individual 
examples. Once again, a difference seems to exist between speaker pairs in terms of 
their preferred strategy. Section 6.1.8.3 showed that the speaker pairs AZ23 & ZZ24, 
on the one hand, and QZ13 & 0714, on the other, occupy opposite ends of a spec- 
trum with regards to how they treat Spanish-origin words. This could be demon- 
strated particularly well because they both played Conc describing card locations 
by referring to them via a kind of grid coordinate, made up of an ordinal number 
of Spanish origin plus a form of the Spanish-origin fila 'row' to specify the row, and 
then another instance of an ordinal number plus the locative -chaw to specify the 
column. In realizing these elements they also differ regarding the prosodic treat- 
ment of information status and/or structure. Over the course of their Conc game, 
the four speakers all uttered the sequence x fila(-chaw) several times (0714 5, QZ13 
10, AZ23 13, ZZ24 3 times), where x stands for one of the Spanish ordinal numbers 
from one to three. Forms of fila were mostly (but not always, see above) lexically 
given because of their frequent occurrence, while the preceding ordinals prim- 
er(a)/segund@/tercer(a) were mostly lexically new, or, if they had recently occurred 
already, also given. In any case, given that these four speakers always began the 
specification of a location at which they wanted to guess the identity of an object 
with such an instance of x fila(chaw), in the context of how speakers played the 
game, the ordinal number x is the more informative part of the phrase, since 
fila(chaw) was highly expectable both from its position as immediate follower of 
the ordinal number (the only other attested option for following an ordinal number 
in these corpora is the locative suffix) and as part of the initial phrase for each new 
location-specifying move, since the specification of the column by numeral+chaw 
always followed after the row specification. In the information structural annota- 
tion, these phrases were uniformly annotated as beingly broadly at-issue, because 
assuming a more specific implicit QUD like IN WHICH ROW AND COLUMN IS WHICH 
ITEM? instead ofthe broader WHERE IS WHICH ITEM? depends exactly on the task-spe- 
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cific information status of items like fila, which is in question here. However, it 
should be clear that in terms of both information status and structure, as well as 
regarding plausible assumptions about how speaker and hearer here expected the 
discourse to evolve given the absolute regularity of these sequences,** the imme- 
diate contexts for these utterances in both corpora (QZ13 & OZ14 and AZ23 & ZZ24) 
are as similar as could be. With that in mind, it is remarkable that the two pairs 
realize these phrases in quite distinct fashion. QZ13 & OZ14 realize them either 
with two pitch peaks (*grafted" pattern) on the two Spanish stressed syllables (in 
the numeral and the form of fila), plus a rise towards the end of the final syllable, 
or with just a rise or rise-fall on the Spanish stressed syllable of the numeral, with 
filachaw without its own pitch event (“inherited” pattern), i.e. in the latter case 
with a pattern counted as cueing prefinal main prominence (see Figures 158-160 
above). AZ23 & ZZ24, on the other hand, most often realize them with a rise that 
takes place on the penult of the second word, filachaw, or at the word boundary 
between the first and second word, i.e. with a pattern counted as cueing final prom- 
inence (see Figures 114 and 164 above). QZ13 & OZ14's realizations might be taken 
as evidence that they interpret the context as calling for narrow prefinal focus. 
Their realization with prefinal prominence could also mean that prosody in their 
case is sensitive to the relative givenness of filachaw compared to the preceding 
numeral. In turn that would suggest that AZ23 & ZZ24's realization are not sensitive 
to it. Thus QZ13 & OZ14, who in terms of paying heed to Spanish stress positions 
are more “Spanish-like” in their Quechua prosody, here behave less *Spanish-like", 
as relative givenness is generally thought not to be cued in Spanish (cf. Cruttenden 
2006; Hualde 2002; Hualde & Colina 2014). Conversely, AZ23 & ZZ24, who pay little 
heed to Spanish stress positions, would here be more “Spanish-like” in their insen- 
sitivity to relative givenness. Whatever explanation actually holds, it seems clear 
that under nearly identical context conditions, these two speaker pairs choose two 
systematically different prosodic realization strategies. 

Another aspect worth exploring is the treatment of forms of the nominal- 
ized copula (ka-q-). Wether this is also a matter individual preferences or some- 
thing else is not obvious here. In modifying prenominal position, both ka-q and 
the demonstratives kay and tsay are almost never the location of the tonal tran- 
sition, i.e. they are either uniformly low or high, and many word boundary rises 
occur at the word boundary after one of these modifiers.?*? On the other hand, 


388 Cf. Calhoun 2010a; Turnbull et al. 2015 on language-specific relations of predictability and 
consequent informativity to prosodic prominence, and Clopper et al. 2018 for indications that *pre- 
dictability" might not be a unified phenomenon. 

389 The only case where a prenominal ka-q is not part of the initial low or high stretch is KP04 
Conc 0324, kay kaq ladu kantu-chaw *at the border that is at this side", where kay kaq ladu is 
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with kaq in postnominal position as the last word in a phrase (more frequent than 
its occurrence as prenominal modifier), some of the speakers (TP03, KP04, QF16, 
SG15, ZR29), do not realize the final phrasal tonal transition on forms of ka-q. They 
realize the final rise or fall on the preceding content noun, so that the resulting 
phrase is counted here as cueing prefinal prominence (18 instances in total, see 
Figures 116, 126, 127 above for examples). Of those, 14 tokens are in sequences 
classed as all-new, because of the referential newness of the entire sequence. Other 
speakers (0A32, LC34), however, do realize the final movement on forms of kaq or 
at its initial word boundary (8 instances in total, see Figure 128 for an example). 
Of those, only 5 are in sequences classed as all-new. This is not a clear-cut differ- 
ence between speakers because KP04, SG15 and LC34 also each realize ka-q in the 
respective other way at least once. The contexts in which ka-q occurs are also much 
more heterogeneous than is the case with x fila(-chaw). In all, only five ka-q-final 
sequences in the whole corpus are classed as all-given, in contrast to 24 classed 
as all-new (of which 14 have prefinal prominence, 5 final prominence and 5 are 
indeterminate). This could suggest that the function of ka-q is somehow related 
to introducing new referents, see Bendezü Araujo (2021) for more on its meaning. 
That kaq-final sequences with pre-final prominence are so much more often new 
rather than given, together with the described tendency by a majority of speak- 
ers to not extend the phrasal high tone to it, goes some further way in explaining 
why all-new sequences are associated with prefinal prominence. The behaviour of 
ka-q and the demonstratives seems to indicate that a lexical factor is also involved 
in determining which part of a phrase the high tone is realized on, possibly also 
subject to individual variation. 

The foregoing discussion suggests that in the data summarized in Tables 34-37, 
speaker-and likely also task-specific strategies interacting with lexical factors are 
hidden. If the behaviour of each individual speaker were observed on more data, 
individually regular strategies for the cueing of prominence and its relation to 
information status and information structure would likely emerge. This task is left 
to future research. 

Overall, the results in this section support the hypothesis that the relation 
between prosodic cues, metrical structre, and information structure in Huari 
Quechua is quite as indirect and distributional as discussed in section 3.7.3 for 
Spanish and Cuzco Quechua. They also support the hypothesis that the high 
portion of phrasal contours is associated with prominence, and that that promi- 


realized in a rise-fall phrase, with both the tonal transitions, the initial rise and the final fall, taking 
place at the initial and final word boundary of kaq, respectively. The elevated pitch on kaq is also 
clearly audible. I have no explanation for why this is the case. 
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nence profile of a phrase provides cues to its information structure and the infor- 
mation status of the elements it is composed of. In the next section, this main 
finding will be used for the OT-analysis of the Huari Quechua contours. Section 
6.4 will then look at individual examples in context, expanding the discussion to 
sequences containing verbs, and use findings from this section as guide for the 
analysis of how information structure relates to prosody in those examples. 

The entire analysis in this section has involved several steps of somewhat sub- 
jective interpretation in the annotation process. I think that the approach is still jus- 
tified, since it allows for some exploratory generalizations despite the spontaneous 
nature of the data. Hopefully, they can also serve as basis for the better formulation 
of testable hypotheses in future research. 


6.3 OT-Analysis 


In this section, the tonal alignment patterns will be translated into OT constraints 
and thereby related to each other as well as to Huari Spanish intonation, providing 
the second half of the answer to question (36)b, the first half of which was given 
in section 5.3. I will derive constraint rankings generating all attested phrasal con- 
tours and alignment variants. Beginning with the rise-falling contours (6.3.1), it will 
be established that alignment with the word boundary (cf. section 6.1.2) and align- 
ment with the word penult (cf. section 6.1.4) require different rankings, thus con- 
stituting a word boundary-pattern and a word penult-pattern, respectively, while 
alignment with the phrase boundary (cf. section 6.1.3) occurs in both of them to dif- 
ferent degrees. The variation in alignment of the final peak in the phrase between 
the penultimate and final syllable (cf. section 6.1.6) will be shown to naturally arise 
from the constraint rankings for the word boundary- and the word penult-patterns. 
The rising and the only-falling contours will be covered in sections 6.3.2 and 6.3.3, 
and the “inherited” and “grafted” patterns in 6.3.4. The rankings moving from 
the word boundary-variant via the word penult-variant to the loanword variants 
describe a progression from a purely edge-oriented prosody to one where promi- 
nent syllabic positions become successively more integrated, the opposite direction 
of the one in which we progressed in the Spanish OT analysis (section 5.3). This is 
one of the points of connection to the Spanish analysis, but not the only one. We 
already saw that certain contours exist in both Huari Spanish and Quechua. Up to 
a certain point, they can be interchangeably generated by several different con- 
straint rankings. Thus the tonal grammar of the speaker community can be shown 
to have both language-specific peripheries and a common central space. 

Starting with assumptions, I assume that a metrical relation between words 
in a phonological phrase exists which makes exactly one word more prominent 
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than the others (cf. section 3.3.1). I also assume that as a default, it is the rightmost 
word that is strongest in the metrical representation. Default here means what is to 
be understood or expected in the absence of signaling to the contrary (cf. Calhoun 
2010). Such a representation is ambiguous with regards to information structure, 
as it might indicate an answer to a broad QUD or one in which only the rightmost 
element is at-issue (cf. sections 3.7.3, 6.2.3.3). See section 6.2.3.3 for evidence and 
a more detailed argumentation. I assume that the location of the high tone in a 
contour cues prominence, in the following way (cf. section 6.2.3.3): 

In rising contours: the most prominent word in the phrase is cued by the rising 
tonaltransition taking place on a syllabic position on that word or at the left bound- 
ary of that word. That is to say, if the rise takes place on the initial of two words in 
a phrase, and the final word is realized with a high plateau, then the initial word 
is taken to have higher prominence. But if it occurs at the word boundary between 
them or anywhere on the final word, the final word has the higher prominence. 

In rise-falling contours: the most prominent word in the phrase is again cued 
by the rising tonal transition taking place on a syllabic position on that word or at 
the left boundary of that word. That is to say, in a 3-word phrase where the rise 
takes place at the left boundary of the middle word and the fall takes place on its 
penult, the middle word is taken to be prominent. If the rise takes place at the left 
boundary of the final word and the fall on its penult, the final word it taken to be 
prominent. The analysis here omits phrases where the rise takes place at the begin- 
ning of or within one word and the fall on a later one. As the findings in section 
6.2.3.2 indicate, they are very infrequent variants (category g) in Table 32, 2 tokens) 
and could also consist of a rising phrase plus a falling one. 

In only-falling contours: the most prominent word is that inside or at the right 
word boundary of which the fall occurs. 

Regarding which forms are actually attested, I use the counts given in Tables 
30-33 from section 6.2.3.2. For the input forms of Quechua, I assume the penult to 
be marked as the prominent syllable in a word (consisting of a stem plus suffixes) 
underlyingly, but this only has an effect in the ranking for the word penult-pattern, 
showing that the Quechua tonal grammar pays varying attention to stress in the 
sense ofthe typology by Hyman (2014). Strictly speaking, if a speaker produced only 
utterances in the word boundary-variant, there would be no grounds to assume 
the presence of a prominent position at the word level in their grammar at all. 
However, in our data, no speaker is consistent in this (cf. section 6.2.3.2). I therefore 
take the penult to be prominent, but this prominence is solely expressed by serving 
as an alignment anchor in the word penult-variant. I assume the three contours, 
the rising-falling, the rising, and the only-falling one, to be different in their tonal 
input. The rising-falling contour is assumed to have the input tonal sequence Lors 
H, Lors the rising contour Lor; H; and the only-falling contour H, Lert- 
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For reasons of space, the analysis will use only phrases containing two words. 
It easily extends to phrases of only one word because in those cases “internal” 
alignment is outranked by alignment with the phrase edges, and what would have 
to be considered for longer phrases will be remarked on individually. The following 
are the alignment constraints??? to build the varying rankings from (all either taken 
or slightly adapted from Gussenhoven 2000a, 2004): 


(154) ALIGN (Lon; Rt;) align the right L tone with the right edge of the phonological 
phrase 


(155) ALIGN (Lai; Lt4) align the left L tone with the left edge of the phonological 
phrase 


(156) ALIGN (H, Rt) align the H tone with the right edge of the phonological 
phrase 


(157) ALIGN (H, Lt) align the H tone with the left edge of the phonological phrase 


(157) ALIGN (H, Lt) align the (left edge of the) H tone with the left edge of a 
prosodic word?! 


(159) ALIGN (H, Lt,,) align the (left edge of the ) H tone with the left edge of the 
strongest prosodic word in PhP 


(160) ALIGN (H, Rt,,) align the (right edge of the) H tone with the right edge of the 
strongest prosodic word in PhP 


(161) ALIGN (H, o; Rt): align the right edge of H with the right edge of a stressed 
syllable 


(162) ALIGN (H, o, Lt): align the left edge of H with the left edge of a stressed 
syllable 


These constraints are automatically in competition with each other. Their competi- 
tion will generate both the observed long low stretches and high plateaux and the 


390 The constraints used for the Quechua analysis were mostly already introduced in the Spanish 
OT-analysis (section 5.3). I repeat them here for convenience. 

391 The complementary constraint aligning the H with the right edge ofthe prosodic word was not 
found to have an effect in any of the contours. 
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tonal transitions between them. In order for them to be able to do this, we further- 
more need three high-ranked faithfulness constraints ranked above all of them. 


(163) LINEARITY: the sequence of tones in the output is the same as in the input 
(164)  MAXIO(T): every tone in the input has a correspondent in the output 
(165) NoCRowp:a TBU has only one tone 


The TBU is assumed to be the syllable here. The puzzle about cases where more 
than one tone occupies a single syllable (cf. the discussions in sections 6.1.7 and 
6.1.8.1) will be mostly left aside. We would need further systematic data to really 
decide this point. In principle, an interesting solution might be that deciding 
between moras and syllables as TBU is an additional variable choice for speakers. 
In both cases, NoCRowD would be kept high-ranked. Another observed effect of 
tonal organization under temporal constraints is already integrated in the ranking 
here: NoCROWD is ranked above all alignment constraints, necessary for the “dis- 
placement" effect observed in section 6.1.3, but underneath MAXIO(T), so that when 
a phonological phrase is mapped onto a monosyllabic word, the tones will be real- 
ized despite crowding. However, since there are also attested cases where tones 
are clearly truncated because of crowding, this is likely not the whole story. Recall 
that Huari Spanish also shows variability in tonal realization under temporal con- 
straints (section 5.3.3). 

Regarding association constraints, the findings on phonetic enhancement in 
section 6.1.6 suggest that the H tone at least in the rising-falling contours associates 
some of the time, but not that the stressed penult always associates with a tone (so 
that o' € T is never active, unlike in Spanish, cf. 5.3.2). For such cases, three associ- 
ation constraints are needed: 


(166) H BU: the H tone is associated with a TBU 


(167) (0), € T: associate a tone with the stressed syllable of the metrically strong 
prosodic word in a phonological phrase 


(168) NoAssoc: TBUs are not associated with tones 


Their ranking must be H > TBU >> NoAssoc >> (0), € T to conform to the obser- 
vations. H > TBU and NoAssoc likely have overlapping distributions (Boersma & 
Hayes 2001), so that sometimes, the H tone fails to associate, because not every 
syllable on which the peak is realized is longer than adjacent syllables. In the 
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loanword patterns, (0)„ € T is then also promoted above NoAssoc, because of the 
observed lengthening of Spanish stressed syllables in them. These association con- 
straints do not interfere with the alignment constraints generating the different 
contour variants discussed in the following, so they are not given in the rankings. 


6.3.1 Rise-falling contours 


The progression through the analysis of the variants begins with the rising-falling 
contours. Since they include one more tone than the other two contours, their anal- 
ysis establishes the basic mechanisms that can then be adapted to the analysis of 
the bitonal contours (sections 6.3.2 and 6.3.3). 


6.3.1.1 Word boundary-variant 

In all the constraint rankings here, the three faithfulness constraints are ranked 
above the alignment constraints. For rise-falls, the highest-ranked alignment con- 
straints, ALIGN (Log, Rt) and ALIGN (Lore Lt), establish the initial and final L tone as 
initial and final phrasal boundary tones, respectively. They are followed by the con- 
straints aligning the H tone with the boundaries of the prominent prosodic word, 
ALIGN (H, Lt„) and ALIGN (H, Rt„). Their ranking ensures that in this variant, a high 
plateau-like realization will form on that word. The relative ranking of ALIGN (Lor 
Lto) >> ALIGN (H, Rte) below them ensures that in phrases in which a prefinal word 
is prominent, a low stretch will form on all words following the prominent word. 
ALIGN (H, Lt„) >> ALIGN (Lore Rty), the next constraints in the ranking, are relevant 
for phrases with more than two words, which are not treated in the tables here: 
their ranking results in the plateau-like realization extending further to the left 
of the prominent word, either up to the first internal word boundary, as attested 
e.g. in Figure 126, or to another prefinal one, as e.g. in Figure 127.?? The three 
constraints below have no influence in this ranking, their relative ranking results 
from how they are ranked in the other variants. That I leave them ranked relative 
to each other at all is because I want to change as little as possible between the 
variant rankings. 


392 How the difference between these two could be captured is not covered here. Note that in both 
examples, the initial low stretch extends across the demonstratives, and the plateau begins with 
the first content word. 
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(169) Alignment constraint ranking for the word-boundary variant (with multiple 
alignment of the H tone) of Quechua (rise-falls) 
LINEARITY, MAXIO(T), NOCROWD >> ALIGN (Lon; Rtg) >> ALIGN (Lor; Lto) >> 
ALIGN (H, Lt) >> ALIGN (H, Rt) >> ALIGN (Loro Lto) >> ALIGN (H, Rt;) >> ALIGN 
(H, Lt) >> ALIGN (Lor, Rte) >> ALIGN (H, Ltg) >> ALIGN (H, o', Lt) >> ALIGN 
(H, c, Rt) 


The low ranking of the syllabic alignment constraints (161) and (162) means that 
the word-boundary pattern does not make reference to a syllabic position at all. 
This is because section 6.2.3.2 established that rise-fall phrases are unattested that 
combine a rise taking place at a syllabic position within a word with a fall taking 
place at a boundary between the words, or vice versa, except for the case where the 
rise takes place at the boundary preceding the final word, and the fall taking place 
on that word's penult or final syllable ((153)a), cf. Table 32). This seeming exception 
results from the same ranking: the fact that the fall at the right edge ofthe H tone in 
rise-falling contours often aligns with the penult of the final word, even if the rise 
at its left edge aligns with the preceding word boundary, simply falls out from the 
high-ranking constraint aligning the L tone with the right edge of the phrase, ALIGN 
(Loro Rtg), in combination with NoCRow» (cf. winning candidate a in Table 38 and 
the left contour in Figure 135 for an example). In individually-phrased words, any 
of the variant rankings ((169), (170), (173)) will result in at least the penult (plus 
possibly preceding syllables) being high for the same reason, as attested (cf. section 
6.1.5). Bisyllabic individually-phrased words are a special case, because they don't 
ever seem to realize a tritonal contour. Solving this is not trivial. In principle, I 
would like to propose that an OCP-constraint is active and ranked above the align- 
ment constraints, but below NoCrowp. This would ensure that the rising-falling 
contour on such phrases is always realized bitonally with H and Log, because 
even though the alignment of L;;;is ranked higher than that of H, the high-ranking 
OCP-constraint will prevent candidates in which H is not realized, and the ranking 
Of ALIGN (Lori Rt;) over ALIGN (Loi; Lto) ones in which Loy is not realized. However, 
MAxXIO(T), which has to be ranked above NoCROWD as pointed out above because of 
observed bitonal contours on monosyllabic phrases, would prevent any effects of 
the OCP constraint. Recall from section 5.3.3 that Cho & Flemming (2015) found con- 
tinuously variable tonal reduction under continuously variable time constraints 
(increased speech rate) in Korean, but categorical tonal deletion of the second L in 
APs with less than four syllables (normally with an LHLH contour). This seems com- 
parable to our case of bisyllabic phrases: one of the tones of the tritonal contour is 
categorically not realized. That the deleted tone is not the H is likely truly due to 
an OCP-effect as described, but we have to assume that the deletion occurs prior to 
the step when the alignment constraints take effect (Kiparsky 2015), since for them 
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MAXIO(T) must be high-ranking, so that its interaction with NOCROWD produces the 
other observed continuous crowding and compression phenomena. While some- 
thing like this must be going on, truly solving this puzzle must be a task for the 
future. This is yet another case showcasing the complexities of phenomena sur- 
rounding tonal crowding, truncation and deletion, and their categorical vs. contin- 
uous expression. 

Returning to the main discussion, the same ranking (169) also produces attested 
rise-falling contour variants when the initial word is prominent in the phrase, with 
the parallel displacement of the H tone by one TBU away from the left word bound- 
ary, now due to the high-ranking ALIGN (Lor; Lt») ensuring that the phrase-initial 
TBU is occupied by the L tone (cf. winning candidate d) in Table 39, exemplified 
on individually-phrased words in Figures 144 and 146). In the tableaux, brackets 
mark the extent of the prosodic words, curly brackets the phrase. Accents mark the 
strongest element within its domain (syllable within prosodic word and prosodic 
word within phonological phrase). Dashed lines signify alignment of a tone with a 
syllable, dashed arrows indicate multiple alignment. Black dots are tonal targets. 


Table 38: OT-Tableau with the constraint rankings to arrive at the correct alignment behaviour for 
the word-boundary variant with multiple H tone alignment of a rise-falling Quechua phonological 
phrase (b) containing two prosodic words (w) consisting of four syllables (c) each, with the final word 
prominent in the phrase. 
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Table 38 (continued) 
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Even for the attested phrase pattern where both the rise and fall only take place 
at the end of the phrase (e.g. Figures 132, 175), instead of the rise ocurring already 
at the beginning of the final word, no reference needs to be made to any syllabic 
position within a word, as winning candidate b) in Table 40 shows. The only 
difference between the rankings (169) and (170) is that in (170), the constraints 
ALIGN (H, Lt,) and ALIGN (H, Lt), high- and mid-ranking, respectively, in (169), 
have been downranked, effectively preventing the H tone from also aligning left- 
wards and creating a plateau. Note that the constraint ALIGN (H, Rt), seeking to 
align the H tone with the right edge of the prominent word, is still active in (170). 
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Table 39: OT-Tableau with the constraint rankings to arrive at the correct alignment behaviour for the 
word-boundary variant with multiple H tone alignment of a rise-falling Quechua phonological phrase 
(p) containing two prosodic words (w) consisting of four syllables (o) each, with the prefinal word 


prominent in the phrase. 
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Table 39 (continued) 


[(co'o), (co'o).]s 
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ALIGN (H, Rt.) here prevents ALIGN (Lox, Lto) from pushing further to the left, thus 
enabling ALIGN (Lor; Rts) to push rightwards and create the observed low stretch, 
even though it is lower-ranked than ALIGN (Lors Lty). This ranking without refer- 
ence to a stressed penult must exist, because the *displaced" pitch peaks (cf. section 
6.1.3) cannot be generated from a ranking in which tones align with the word penult. 
Thus this must be one of the rankings that create a phrase-final rise-falling contour 
in Quechua. A similar contour is generated with pitch accents and boundary tones 
in Huari Spanish (section 5.3.2), and it is also generated in the word penult-pattern 
(cf. section 6.3.1.2). 


(170) Alignment constraint ranking for the word-boundary variant (without 

multiple alignment of the H tone) of Quechua, rise-falls (changes from (169) 
in bold) 
LINEARITY, MAXIO(T), NOCROWD >> ALIGN (Lor, Rtg) >> ALIGN (Lor; Lto) >> 
ALIGN (H, Rt) >> ALIGN (Lor, Lto) >> ALIGN (H, Rt) >> ALIGN (Lor; Rtg) >> 
ALIGN (H, Lty) >> ALIGN (H, o^ Lt) >> ALIGN (H, o, Rt) >> ALIGN (H, Lt,,) >> 
ALIGN (H, Lt) 


That ALIGN (H, Rt„) is still ranked so high has consequences that only emerge when 
a prefinal word instead of the final one is prominent in the phrase. Table 32 in 
section 6.2.3.2 counts 8 tokens where both the rise and the fall take place on either 
the penult or the final syllable of a prefinal word. Looking at these examples indi- 
vidually, the majority realize the high tone on either of these syllabic positions, but 
not both. That is to say, in one variant the rise takes place at the left edge of the 
penult, and the fall at its right edge, while in the other, the rise takes place at the 
left edge of the final syllable and the fall at its right edge, which is also the word 
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Table 40: OT-Tableau with the constraint rankings to arrive at the correct alignment behaviour for 
the word-boundary variant without multiple H tone alignment of a rise-falling Quechua phonological 
phrase (b) containing two prosodic words (w) consisting of four syllables (c) each, with the final word 
prominent in the phrase. 
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Table 40 (continued) 
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boundary. As an example of the latter variant, see TP03 Conc Q 0917 (Figure 172), 
as an example of the former, HA30 Conc Q 2398 (Figure 173). 


(171) TP03 Conc Q 0917 
hatun runa 
big person 
“big person” 


M, WW Phe Ann N 


1h det U hatna alttan bateiant 
| HR Now nnl ww 


FOL 


hatm nins 


bre perum 


big parsa 


Figure 172: TP03_Conc_Q_0917° (part of a declarative with a rise-falling contour). 


393 https://osf.io/8nvrq/ 
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etl acutis 


$i Vb. d 


ating ia enit 


caterang ber house 


Figure 173: HA30_Conc_Q_2398™ (part of a declarative with a rise-falling contour). 


(172) HA30_Conc_Q_2398 
wayi-n-man ` yayku-ykaa-q 
house-3-DEST enter-PROG-AG 
“entering her house” 


The variant produced by TP03 is what the ranking (170) predicts when applied to a 
phrase in which a prefinal word is prominent: ALIGN (H, Rt„) makes sure that the H 
tone occupies the final syllable position in the prominent prefinal word, but ALIGN 
(Lop Rt) is unimpeded in its effort to push the phrase-initial L tone as far right as 
possible until it hits upon the H tone occupying the final syllable and causing the H 
tone to be restricted to only that TBU (cf. winning candidate f) in Table 41). In par- 
ticular the stressed penult does not attract the H tone and is instead realized as part 
of the low stretch produced by ALIGN (Lore Rty). The fact that this variant is attested 
corroborates the validity and plausibility of ranking (170) for both finally-promi- 
nent and non-finally-prominent rise-falling phrases without multiple H alignment. 
The rising contour exhibits similar variation (cf. section 6.1.3). A similar analysis to 
represent it is made in section 6.3.2. For the variant as produced by HA30, where 
the H tone is restricted to the penult of the prefinal word, a ranking is needed in 
which alignment to the prominent syllabic position is high-ranked. This will come 
next. It seems plausible to take the competition between these rankings, leading to 
variable alignment with either the final syllable or the penult at several occasions 
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in the overall data, to be one of the reasons for the quantitative findings on variable 
peak alignment in section 6.1.6. 


Table 41: OT-Tableau with the constraint rankings to arrive at the correct alignment behaviour for 
the word-boundary variant without multiple H tone alignment of a rise-falling Quechua phonological 
phrase (b) containing two prosodic words (w) consisting of four syllables (o) each, with the prefinal 
word prominent in the phrase. 
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Table 41 (continued) 
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6.3.1.2 Word penult-variant 

The previous rankings, in which alignment is only oriented towards prosodic 
edges, cannot generate examples like Figure 173. For them, a ranking is needed in 
which the two constraints militating for alignment between the right and left edge 
of the H tone with the right and left edge of the prominent syllable, ALIGN (H, 6, 
Rt) and ALIGN (H, o, Lt), are promoted to a higher position. This is done in ranking 
(173). Winning candidate b) in Table 42 on phrases with final prominence is again 
the phrase-final rise-falling contour, also generated in the *phrase accentuation" 
variant of Huari Spanish (cf. sections 5.3.2, 5.3.3), and from the word-boundary 
ranking without multiple H tone alignment (170) on finally prominent phrases, 
candidate b) in Table 40. Note again that the H tone here might well associate, as it 
does in the Spanish contour, but this is not relevant for generating the contours (see 
above). ALIGN (H, Rt) here is kept at a relatively high position, only having moved 
ALIGN (H, o', Rt) and ALIGN (H, o, Lt), above it. This keeps the difference between 
(170) and (173) minimal and also has the desired effect of keeping the high tone 
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within the prominent word in winning candidates, even though ALIGN (H, o', Rt) 
and ALIGN (H, o^ Lt) are not specified for this.??5 


(173) Alignment constraint ranking for the word-penult variant of Quechua (rise- 
falls) 
LINEARITY, MAXIO(T), NOCROWD >> ALIGN (Log; Rtg) >> ALIGN (Lop, Lty) >> 
ALIGN (H, o^ Lt) >> ALIGN (H, o, Rt) >> ALIGN (H, Rt) >> ALIGN (Loa Lto) >> 
ALIGN (H, Rt) >> ALIGN (Lor; Rtg) >> ALIGN (H, Lto) >> ALIGN (H, Lt,,) >> ALIGN 
(H, Lt„) 


Table 42: OT-Tableau with the constraint rankings to arrive at the correct alignment behaviour for the 
word-penult variant of a rise-falling Quechua phonological phrase ($) containing two prosodic words 
(w) consisting of four syllables (o) each, with the final word prominent in the phrase. 
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f Á 1x 
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395 With a change in relative ranking between ALIGN (Lore Rts) and ALIGN (H, Ltọ), the very in- 
frequently attested variant where the rise takes place on the penult of one word and the fall in 
that of a later word (combination g) in (153) and Table 32 in section 6.2.3.2, would be the winning 
candidate, indicating perhaps that their relative ranking distributions overlap just enough to oc- 
casionally produce the rare variant (cf. Boersma & Hayes 2001). Another option for keeping the H 
tone within the prominent word might be to say that it associates secondarily with that word, as 
Roettger (2017) does for Tashlhiyt Berber, but this is disallowed in Gussenhoven (2004). 
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Table 42 (continued) 
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Applying the same constraint ranking to phrases in which the prefinal word is 
prominent now finally yields the winning candidate c) in Table 43, exemplified by 
HA30_Conc_Q_2398 (Figure 173), the variant realization to candidate f) in Table 41. 

The idea here has been to keep the analysis of the word-penult pattern as close 
to the spirit of a tonal grammar paying more heed to phrases and edges than to 
words and prominent syllables. In this sense, this is an analysis of the word-penult 
pattern that is minimal in its concession to the penult as a prominent position in the 
word. That is in keeping with the characterization of Quechua prosody in section 
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Table 43: OT-Tableau with the constraint rankings to arrive at the correct alignment behaviour for the 
word-penult variant of a rise-falling Quechua phonological phrase ($) containing two prosodic words 
(w) consisting of four syllables (o) each, with the prefinal word prominent in the phrase. 
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Table 43 (continued) 
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6.1, as making some reference to a prominent syllabic position, but minimizing the 
effects of this. The analysis is kept here so much on this “accentless” side of things 
not because I think that more accent-oriented grammars are not available to the 
speakers, but because I want to map out the space of grammatical possibility that 
the Huari speakers can take recourse to by tracing its peripheral boundaries. It is 
clear that the speakers have access to the grammars represented by the rankings 
(169) and (170), which do not need to make reference to a prominent syllabic posi- 
tion at all. From the Spanish OT-analysis it is clear that they also have access to 
the other end of the spectrum, the “main variant” of Spanish, where even stressed 
syllables in nonprominent words in the phrase form pitch accents. In between, they 
have a variety of only subtly different options both at the level of the contours 
produced and of the constraint rankings utilized to generate them at their disposal. 
This middle ground is accessible from either direction and could therefore be seen 
less as belonging to either language but to the structural communicative resources 
available to the speakers of this community. In the following, the constraint rank- 
ings will be adapted to rising contours. 


6.3.2 Rising contours 


For the rising contours, the rankings for the rise-falling contours can be adapted. 
The global difference is that all constraints making reference to the right L tone 
here have no target and are therefore omitted. In addition, the constraint aligning 
the H tone with the right phrase boundary ALIGN (H, Rt;) is promoted to a position 
where it is undominated by all other alignment constraints militating against it. 
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Thus we have a phrase-initial L tone and a phrase-final H tone and the remaining 
constraints generate the attested contours. This ensures, together with the faithful- 
ness constraints, that bisyllabic phrases will always realize the initial syllable low 
and the final high, as attested (see Figure 141 and the discussion in section 6.1.5), 
even under the word-penult variant ranking (176). Since the promotion of ALIGN 
(H, Rt„) means that ALIGN (H, Rt,,) will never be able to differentiate between any 
candidates that comply with ALIGN (H, Rt,), i.e. all attested candidates, it has also 
been omitted here. 


6.3.2.1 Word boundary-variant 

With these changes, the ranking for the word-boundary variant of the rising 
contour with multiple H tone alignment, parallel to that of the rise-falls (169), is the 
following: 


(174) Alignment constraint ranking for the word-boundary variant (with multiple 
H tone alignment) of Quechua (rises) 
LINEARITY, MAXIO(T), NOCROWD >> ALIGN (H, Rt;) >> ALIGN (Lor; Lt) >> ALIGN 
(H, Lt,) >> ALIGN (H, Lt,,) >> ALIGN (Lore Rt4) >> ALIGN (H, Lt;) >> ALIGN (H, 6, 
Lt) >> ALIGN (H, o; Rt) 


This ranking correctly selects the winning candidate b), in the case where the final 
word is prominent in the phrase, and e) when the prefinal word is prominent (see 
Tables 44 and 45, respectively). 

In sections 6.1.3 and 6.1.5, it was already discussed that there are two variants 
of the rising contour when the rise takes place at the end of the phrase-final word: 
in one, the high tonal target is clearly realized already on the penult, in the other, 
the penult is realized as low and only the final syllable is high (e.g. (101)/Figure 
118 vs. (118)/Figure 138, and examples for both in Figure 142, cf. also the sche- 
matic difference in Figure 137). This latter variant is generated by ranking (175), 
parallel to ranking (170) for the rise-falling contours. In it, ALIGN (H, Lt„) has been 
ranked down, allowing ALIGN (Lor; Rt») to have a much stronger effect. This yields 
the desired result, generating winning candidate c) in Table 46 under the condi- 
tion of the final word in the phrase being prominent, but it poses something of a 
conundrum when applied to a phrase in which the prefinal word is prominent (see 
below). It is also something of a deviation since it ranks ALIGN (H, Lt) down, which 
comes back up again for the word-penult pattern. 
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Table 44: OT-Tableau with the constraint rankings to arrive at the correct alignment 
behaviour for the word-boundary variant with multiple H tone alignment of a rising 
Quechua phonological phrase (b) containing two prosodic words (w) consisting of 
four syllables (o) each, with the final word prominent in the phrase. 
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Table 44 (continued) 
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Table 45: OT-Tableau with the constraint rankings to arrive at the correct alignment 
behaviour for the word-boundary variant with multiple H tone alignment of a rising 
Quechua phonological phrase (b) containing two prosodic words (w) consisting of 
four syllables (c) each, with the prefinal word prominent in the phrase. 
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Table 45 (continued) 


ALIGN (H, o', Rt) 


ALIGN (H, o', Lt) 
ALIGN (H, Ltg) 
ALIGN (Loic Rt») 


ALIGN (H, Lt,,) 


ALIGN (H, Lt,,) 
ALIGN (Loic Lty) 


ALIGN (H, Rtg) 
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(175) Alignment constraint ranking for the word-boundary variant (without 
multiple H tone alignment) of Quechua (rises) 
LINEARITY, MAXIO(T), NOCROWD >> ALIGN (H, Rtg) >> ALIGN (Lor; Lty) >> ALIGN 
(Loro Rtg) >> ALIGN (H, Lto) >> ALIGN (H, o, Lt) >> ALIGN (H, o, Rt) >> ALIGN 
(H, Lt) >> ALIGN (H, Lt.) 


Table 46: OT-Tableau with the constraint rankings to arrive at the correct alignment 
behaviour for the word-boundary variant without Multiple H tone alignment of a rising 
Quechua phonological phrase (b) containing two prosodic words (w) consisting of four 
syllables (o) each, with the final word prominent in the phrase. 


i fy > P > > > > > > 
[(co'o), (00'0)W 14 Ern ot S E Ro cR RB k 
e e e e e e e e 
Lou HL onc z 2 2 z z 2 2 z 
FZ £ € = FFF I 
gc ot E aa gR 
Eee Ë szEE 
a. HHJ 6* * OK HÆ 
! 4 
(000 0) (6 O oog] 
" m w 
[ T 
| H 
i $ 
1 i 
Lort —— MÀ H 
LI i 
(0 O O o) {o oodo) 
Ü w 
$ $ 
! | 
LI ' 
Lou —O———— JQ—— H 
=C * F * * KK HÆ 


448 —— 6 Huari Quechua 


Table 46 (continued) 
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In the case of a prefinal prominent word, the ranking does not yield a separate 
contour variant. Instead it again selects candidate c), meaning that this ranking 
does not differentiate between final and prefinal prominence via an intonational 
contour??6 (the tableau has been omitted here to save space). In comparison, in the 


396 No other ranking of the existing constraints could be found that would have been able to 
distinguish between the prominence configurations. That is because the ranking promotes ALIGN 
(Lor; Rt4) to a nearly undominated position in order to produce the attested winning candidate c). 
A theoretical possibility would have been to create a new constraint, like e.g. ALIGN (Loi; Rt), and 
ranking it above ALIGN (Lg, Rt). That would have produced candidate b) in the case of prefinal 
prominence and still kept c) as the winning candidate in the case of final prominence. However, 
it would have meant both introducing a new constraint purely for this case and making b) in turn 
ambiguous between a reading in which the final word is prominent and one in which the prefinal 
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rise-falling contours, the ranking always differentiates according to prominence 
via separate contours. However, the distribution of rising contour variants as given 
in Table 30 in section 6.2.3.2 supports the analysis here, because the sixth contour 
that is “missing” from the comparison with the other variants, where the rise takes 
place not at the penult but the final syllable of a prefinal word (candidate f), only 
occurs when that word is an oxytonic Spanish loan, and the position for the rise is 
thus lexically specified (it is also very rare). So the gap in the paradigm of contours 
generated from the rankings corresponds to that found in the data. Whether this 
ranking occasionally really produces contours where the position of the H tone is 
misaligned with the prominent word, or whether some process acts to counteract 
this, remains unknown. 


6.3.2.2 Word penult-variant 

For the ranking generating contours in the word penult-variant, ALIGN (H, Lt„) is 
promoted back to position 5 among the aligment constraints, while ALIGN (H, 6; 
Lt) and ALIGN (H, o, Rt) are ranked above it. This parallels the change from the 
word boundary- to the word penult-variant in the rise-falling contours, but note 
that here it is ALIGN (H, Lt,,), and not ALIGN (H, Rt,,), which is needed to ensure that 
the rise takes place in the prominent word. The ranking (176) produces the winning 
candidate a) in Table 47 under the condition that the final word is prominent, the 
direct variant to c), which is derived from ranking (175). Under the condition of the 
prefinal word being prominent, the ranking also generates the attested winning 
candidate d) in Table 48. 


(176) Alignment constraint ranking for the word-penult variant of Quechua (rises) 
LINEARITY, MAXIO(T), NOCROWD >> ALIGN (H, Rtg) >> ALIGN (Lor; Lty) >> ALIGN 
(H, o, Lt) >> ALIGN (H, o, Rt) >> ALIGN (H, Lte) >> ALIGN (Loi Rt) >> ALIGN 
(H, Lto) >> ALIGN (H, Lt,,) 


6.3.3 Only-falling contours 


The analysis for the only-falling contours is complementary to that of the rises in 
terms of which constraints are used, but otherwise runs entirely in parallel. Here, 


word is prominent. Since it would not have generated an attested candidate not generated other- 
wise, and since candidate f) is only attested in the *inherited" pattern and therefore cannot be the 
target contour here, the addition of such a constraint would be spurious and also not reduce the 
number of ambiguous contours. 
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Table 47: OT-Tableau with the constraint rankings to arrive at the correct alignment 
behaviour for the word-penult variant of a rising Quechua phonological phrase (d) 
containing two prosodic words (w) consisting of four syllables (o) each, with the final 
word prominent in the phrase. 
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Table 47 (continued) 

[(00'9), (06'0), Io PP rPPr P PE 

LouHLore 2222 2 2 2 Z2 
FZ £ II I E = = 
fuse SS. qr 
eee E 

e. *I * HA 7* * * 

(00 0 o](o O O O) 

^ J ' w 

( r 

l |^ 

LI LI 

' ' 

TTE ERR H 


| 


w 


(o o O ollo o O O) 
Ly 


Table 48: OT-Tableau with the constraint rankings to arrive at the correct alignment 
behaviour for the word-penult variant of a rising Quechua phonological phrase (d) 
containing two prosodic words (w) consisting of four syllables (o) each, with the prefinal 


word prominent in the phrase. 
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Table 48 (continued) 


ALIGN (H, Lt,,) 
ALIGN (H, Ltg) 
ALIGN (Loic Rtg) 
ALIGN (H, Lt,,) 
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the constraints making reference to the left L tone are omitted and ALIGN (H, Lt) is 
ranked directly underneath ALIGN (Lox, Rtg), so that all winning contours will start 
high at the left edge and be low at the right edge, with what happens in between 
determined by the other constraints. Again complementary to the rises, the inter- 
action of these constraints with NOCROWD and the other high-ranking faithfulness 
constraints also ensures that on bisyllabic phrases, the left syllable is always high and 
the right always low, and here this does not run counter to the penult being occupied 
by the high tone in the word-penult variant. Here it is ALIGN (H, Lt) which does not 
have any distinguishing effect between winning candidates, so it will be omitted. On 
the other hand, ALIGN (H, Rt„) plays the crucial role of making sure that the high tone 
is realized within the prominent word here, just as in the rise-falling contours. The 
only-falling contour is the least frequently occurring contour in our data, particu- 
larly on phrases consisting of more than a single word, as Table 31 in section 6.2.3.2 
shows. This makes some of the expected alignment combinations in multi-word 
phrases exceedingly rare. In particular there is not a single token of an only-falling 
contour on a multi-word phrase in which the fall takes place on the penult of the pre- 
final word. Additionally, no cases were found in which the H tone did not extend at 
least to the right boundary or the penult of the prominent word, so that the separate 
ranking for a word boundary-variant without multiple H alignment is not included 
here. No contours are attested for the only-falling variant that specifically necessitate 
the word penult-variant at all, since on phrase-final words, both the word bounda- 
ry-variant and the word penult-variant result in superficially the same contour. The 
ranking for the word penult-variant is still given here. More data is needed to see 
whether these gaps persist or are accidental. Its low frequency could suggest that the 
only-falling contour is simply an apheretic variant of the rise-falling contour, but it 
does not exclusively occur when there are insufficient phrase-initial TBUs to realize 
the initial L tone on, as a number of examples attests.??" Because the treatment of the 
only-falling contours is entirely in parallel to that of the rises, I will not give tableaus 
here and merely state the rankings for the three alignment variants. 


(177) Alignment constraint ranking for the word-boundary variant of Quechua 
(only-falls) 
LINEARITY, MAXIO(T), NOCROWD >> ALIGN (Log; Rtg) >> ALIGN (H, Lto) >> ALIGN 
(H, Rt) >> ALIGN (Loa, Lt) >> ALIGN (H, Rtg) >> ALIGN (H, Lto) >> ALIGN (H, o^, 
Lt) >> ALIGN (H, o; Rt) 


397 Cf. the examples in section 6.1.5 showing that the two falling variants collapse on individually 
phrased bisyllabic words, with the initial L being omitted, but from trisyllabic words onwards, both 
variants exist side by side. 
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(178) Alignment constraint ranking for the word-penult variant of Quechua (only- 
falls) 
LINEARITY, MAXIO(T), NOCROWD >> ALIGN (Lon; Rtg) >> ALIGN (H, Lt») >> ALIGN 
(H, o, Lt) >> ALIGN (H, o, Rt) >> ALIGN (H, Rta) >> ALIGN (Lr; Rtg) >> ALIGN 
(H, Lto) >> ALIGN (H, Lt,,) 


6.3.4 The marked loanword patterns and moving towards Spanish 


Turning to the patterns only found on Spanish-origin words, the “inherited” pattern 
will be generated by the rankings for the word penult-variant, but with a differ- 
ent specification regarding which syllable is stressed in the lexical entry for the 
word than the default penult specification for Quechua words. In this respect, the 
difference between the word penult- and the “inherited” variant has more to do 
with differences in lexical entries than tonal alignment. However, section 6.1.8.3 
showed that speakers differ non-categorically in whether they align according to 
the Spanish stress position or the penult-default in realizations of the same lexical 
word. Since all speakers are bilingual and do not stress words *wrongly" in Spanish, 
the difference in tonal alignment position can hardly be due to a difference in stress 
position in the lexical entries for the same item between different speakers. It can 
however be captured by introducing two new constraints that seek to align the left 
and right edge of the H tone with the left and right edge of the lexically accented, as 
opposed to the regularly stressed, syllable, respectively. This captures the different 
natures of the truly lexically specified stress position in Spanish loanwords vs. that 
of the entirely regular penult word stress of native Quechua words. 


(179) ALIGN (H, Orry, Lt): Align the left edge of the H tone with the left edge of the 
lexically accented syllable 


(180) ALIGN (H, 0; £x, Rt): Align the right edge of the H tone with the right edge of 
the lexically accented syllable 


If these are each ranked above their counterparts ALIGN (H, o Lt) and ALIGN (H, o’, 
Rt) in a word-penult ranking, the result will be the “inherited” pattern. If they are 
ranked below them, the normal word penult-pattern results. The respective ranking 
distributions ofthese constraint pairs must be at different distances for each speaker 
here, making it more or less likely for them to produce one of the variants preferen- 
tially. However, since section 6.1.8.3 also showed a clear difference between lexical 
items in the production of the marked loanword patterns, the true explanation for 
the variation here cannot be restricted to a difference only between speakers. 
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Regardingthe “grafted” pattern, I said in section 6.1.8.2 that the additional pitch 
accent tones LH” are specified separately and have the same constraint rankings as 
in the Spanish *main" variant which outranks that of the Quechua phrasal tones. 
That means also assuming that the H tone here associates. That ranking is repeated 
here in (181), adapted so that its tones are specified as being from Spanish (Hs and 
Ls) and the alignment and association constraints refer to the lexically accented 
syllable. 


(181) Constraint ranking for the “main” variant of Spanish adapted for the “grafted” 
accent of Quechua 
ALIGN (Hg, 0173) >> (O1gy)u € Ts >> Hs > TBU >> Orgx € T >> ALIGN (Ls, Rtg) >> 
ALIGN (Ls, Lty) >> ALIGN (Hs, Lt;) >> NOASSOC >> ALIGN (Hg, Rtg) 


These constraints can now be inserted into the ranking for the Quechua word-pe- 
nult variant, above all except the faithfulness constraints. 


(182) Alignment constraint ranking for the *grafted" accent pattern on Spanish- 
origin words in Quechua (rise-falls) 
LINEARITY, MAXIO(T), NOCROWD >> ALIGN (Hs, 0; 7x) >> (OLExw< Ts >> Hs > 
TBU >> a; cx € T >> ALIGN (Ls, Rtg) >> ALIGN (Ls, Lte) >> ALIGN (Hs, Lte) >> 
NoAssoc >> ALIGN (Hs, Rte) >> ALIGN (Lor, Rtg) >> ALIGN (Lors Lto) >> ALIGN 
(H, o, Lt) >> ALIGN (H, o, Rt) >> ALIGN (H, Rt.) >> ALIGN (Lort, Lto) >> ALIGN (H, 
Rt;) >> ALIGN (Lar; Rtg) >> ALIGN (H, Lte) >> ALIGN (H, Lt,,) >> ALIGN (H, Lt.) 


In (182), the constraints from the Spanish main variant are in bold. This com- 
bined ranking will generate the attested contour of the “grafted” pattern, because 
the alignment of the differentially specified Spanish pitch accent tones outranks 
that of the Quechua phrasal tones. There is no need to assume that LINEARITY 
applies to the whole sequence LsHs LoHoLg, but just to those tones belonging to the 
same source, in keeping with the analysis for IP-level boundary tones for Spanish 
(section 5.3.3). The analysis further supports the relational underspecification 
of tones at a relevant level of representation, since the L that is phrase-initial 
in all other Quechua variants here is phrase-medial.??* That the Spanish-origin 
stressed syllable is differentially specified as a lexical accent position captures 
the insight gained from section 6.1.8.2 on how the loanword patterns resemble 


398 Alternative analyses would involve either crossed association lines or the stipulation of an 
otherwise unattested H*L pitch accent, and a more complex integration of the different constraints 
into a single ranking. 
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lexical accent systems in other languages. The analysis of the *grafted" pattern 
further demonstrates how aspects of the Quechua and Spanish tonal grammars 
can be integrated, in the middle ground of the prosodic possibility space the 
Speakers can take recourse to. Unlike the phrase-final rise-fall contour, which is 
a convergent intonational option accessible via at least three different constraint 
rankings for the speakers (two from Quechua, one from Spanish), the “grafted” 
pattern represents another option available to the speakers. It is a nonconvergent 
exploitation of the possibilities of two different tonal grammars: a lexically spec- 
ified Spanish accent position uncoupled from the three-syllable window together 
with a peripherally determined Quechua default prominence position; Spanish 
pitch accents together with Quechua phrasal tones. It results in something that is 
best characterized as belonging to the repertoire of these speakers, rather than 
either language. 


6.4 Context-based analysis of prosodic and other cues 
to information structure 


Section 6.2.3.3 presented findings that default prominence in nominal sequences 
is phrase-final, compatible both with broad focus contexts and other information 
structural configurations, and that narrow final and narrow prefinal focus associ- 
ate significantly with final and prefinal prominence, respectively. The analysis was 
restricted to sequences consisting of nominal elements for a more homogeneous 
dataset. This section (6.4) will qualitatively explore further cases that extend the 
analysis to entire utterances also involving verbs, and on units larger than individ- 
ual phrases. I will begin by showing that final main prominence is also the default 
in verbal utterances from Cuento (section 6.4.1). That section will establish the 
prosody of *broad focus" in the sense of asserting larger chunks of information with 
no internal IS-partition in longer utterances. Describing the intonation appropriate 
to such contexts is also important because it establishes a default and contrast foil 
for turns which do contain internal informational structuring because of discourse 
contexts involving more negotiation, as in the subsequent sections on utterances 
from Maptask (section 6.4.2) and Conc (section 6.4.3). The examples investigated 
here corroborate the hypothesis that the right boundary of (narrowly or broadly) 
focused constituents tendentially seeks to align with the right boundary of a phrase 
(a broad cross-linguistic tendency, cf. Féry 2013 and the discussion in section 3.7.3). 
Coherence of individual PhPs in a group and their prominence profile is argued 
to be signaled via tonal scaling. In such groups of PhPs for which context suggests 
no internal IS-partition (broad focus), local pitch span on prefinal PhPs is reduced 
compared to that on the final PhP, thus cueing rightmost prominence (section 
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6.4.1). In groups for which context does suggest an internal IS-partition (prefinal 
narrow focus), peak height and pitch level are reduced (downstepped) after the 
prefinal PhP whose right boundary corresponds to the right boundary of the focus 
domain (section 6.4.2). This cueing of prominence across individual PhPs consti- 
tutes grounds for assuming a prosodic grouping at a higher level, but this grouping 
is not cued via a separate set of boundary tones, and instead only via differences 
in scaling. This suggests a prosodic structure in which the PhP is recursive. Besides 
investigating the prosodic cues for prominence above the level of individual PhPs, 
this entire section (6.4) is also concerned with the interaction of prosodic cues with 
those from other domains (word order and morphology) for information structure. 
We will consider this throughout the entire section, but it will come particularly 
to the fore in in section 6.4.3. Having established prosodic cues for final and prefi- 
nal prominence on units larger than individual PhPs as a contrast foil in sections 
6.4.1 and 6.4.2 on data from Cuento and Maptask respectively, utterances from Conc 
will be used in section 6.4.3. to demonstrate how cues from prosody, word order, 
morphology and context interact in complex negotiations of information. It will 
be shown that they can align or misalign in doing so. I will conclude by arguing 
that neither of the different formal cues directly *mean" categories from informa- 
tion structure, each obeying formal constraints specific to their domain, but that 
their interaction is exploited to navigate subtle information structural differences 
in complex contexts. 


6.4.1 *Broad focus" in Cuento 


This section will consider data from Cuento, the task where speakers hear a 
recorded story and then re-tell it to each other. The Cuento corpora are the most 
narrative corpus type covered in this study. The active speaker in this task has 
relatively great liberty for telling the story they heard from the recording or 
the other speaker in whichever way they choose, they can pace themselves and 
rarely need to anticipate interruptions or challenges to how they build up the 
common ground. The discourse progresses at the speed of their choice. Because 
in a narrative format like Cuento there is very little risk of being misunderstood 
or contested, and speakers have comparatively much time to plan ahead, they can 
effectively unilaterally determine what content should be added to the common 
ground and in what units. They can minimize elements that are not at-issue and 
intended to help specify and negotiate which material should enter CG in what 
way (i.e. elements intended for common ground management instead of content, 
cf. Krifka 2007). Therefore they can to a certain extent forego IS-related internal 
prosodic structuring of utterances and instead increase the use of prosody for 
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demarcating maximal coherent units of material intended to enter CG in one go 
(matching large prosodic units to conversational moves). Because of this, the into- 
nation observed in Cuento exemplifies how new information is asserted in large 
chunks without the anticipation of problems or the need to negotiate as reflected 
in internal IS-partition, i.e. what is often called “broad focus". We will see how 
default rightmost prominence is cued via scaling across units of more than one 
individual PhP to demarcate those larger chunks corresponding to the broad focus 
domains. This constitutes prosodic phrasing at a level above (minimal) PhPs. In 
the absence of a need for marking internal IS-partitions, phrasing can also some- 
times come to be employed to cue high-level syntactic structure like relative con- 
structions or subordinations.??? We'll see how this plays out in the first sequence 
from AZ23 & ZZ24's Cuento. 


6.4.1.1 First Cuento sequence: Establishing a default for broad focus 


(183) AZ23_Cuent_Q_0104—0274* (context for Figures 174 and 175)*?* 


time AZ23 
(seconds) 
104 (L H-—) ((H—— ) 


unay-shi ka-naa huk 

long.ago-REP COP-PST.REP one 

they say a long time ago there was a 
12.1 L 

uhm 


399 This claim is related yet different to that by Féry & Ishihara (2010: 37) that in an “all-new sen- 
tence" (one with broad focus), “the formation of prosodic phrases as well as the tonal pattern and 
scaling depend entirely on the morpho-syntactic structure". Here we will see instead that scaling in 
particular can cue broad focus on an utterance irrespective of syntactic factors, e.g. whether a sen- 
tence is verb-final or object-final. The effect of syntactic structure is instead observable sometimes 
in the boundary location of individual PhPs that form a larger unit whose coherence is signaled by 
scaling and which is the domain of broad focus. 

400 The first line for each turn gives a basic intonational analysis, with low (L) and high (H) tones 
aligned at the position in the line below where their targets are first realized. H tones form pla- 
teaux rightwards only if indicated by a dashed line, L tones form low stretches by default. Round 
brackets () indicate phrase boundaries of PhPs/APs, square brackets [] indicate the boundaries of 
larger units. Where it isn't quite clear whether individual phrases should really be analyzed as 
such, because of severely reduced scaling (see text), the boundaries and tones in doubt are in grey. 
The exclamation mark (!) before a phrase indicates downstep of the entire phrase (pitch level), the 
upside-down exclamation mark (j) indicates upstep. 

401 https://osf.io/m397j/ 
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12.6 


13.6 


16.2 


18.3 


19.6 


21.2 


22.9 


24.0 


24.3 


25.7 


(L  H---------- ) 

hampikuq runa 

healer person 

healer 

[(L H D] 
allaapa yacha-naa hampi-ku-y-ta 
much know-PST.REP heal-MID-INF-OBJ 
[who] knew a lot [about] healing 

(L H-—JXLH-——) 

tsay-qa huk dia-qa aywa-n 
DEM.DIST-TOP one day-TOP go-3 
then one day he went 


hampi-y ashi-q 
heal-INF search-AG 
looking for medicine 
(LH ) 

karu-pa 

far-GEN 

far away 


tsay-chaw-shi 

DEM.DIST-LOC-REP 

then 

LHL 

tsaka-y 

get.dark-INF 

getting dark 

H 

tar- 

(L H--------- ) 

tsaka-pa-ku-ski-n 
get.dark-ITER-MID-ITER-3 

it is getting dark 

![(L 1 ! | LH D] 
mana-raq hampi-y-man chaa-r-nin 
no-CONT heal-INF-DEST arrive-SUBID-3 
[with him] not yet having come upon the medicine 
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irala person much knaw-PS T REP beal-MID-INF-ORJ 


Figure 174: AZ23 Cuent. Q 0126."" Cf. (183) for the context. 


In (183), nearly each turn corresponds to the assertion of a new proposition. The 
propositions are nearly all predicated about the protagonist, who, once introduced 
(at 12.6), is implicitly understood as topical subject and as most prominent dis- 
course referent referred to without a (pro)nominal expression. In terms of the 
model of implicit QUDs by Riester and colleagues (cf. section 3.7.2), the utterances 
in (83) all answer a QUD like WHAT HAPPENED (NEXT)? or WHAT DID THE PROTAGO- 
NIST DO?, with almost no material repeated from one utterance to the next (i.e. all 
material is new). Thus these utterances almost all have the broadest possible focus 
domain. 

Figure 174 (at 12.6 and 13.6) and Figure 175 (at 24.3 and 25.7) are examples 
in point for how high-level prosodic phrasing is cued in such broad focus utter- 
ances. The boundaries of high-level syntactic units are demarcated via a similarly 
high level of prosodic phrasing: the separation between the head noun hampikuq 
runa and its (quasi-) relative phrase allaapa yachanaa hampikuyta in Figure 174, 
and that between the two verb phrases, tsakapakuskin and manaraq hampiman 
chaarnin in Figure 175 are both cued by short breaks in the flow of speech. In both 
cases the first and shorter unit is realized with a rising contour signaling contin- 
uation; the second and longer one with a rise-fall signaling finality. Scaling at the 
level of local pitch span acts to cue coherence across lower-level phrases: each 
individual word in both of the larger phrases could be assumed to form a PhP with 
a rising contour. This is evidenced by low tonal targets at the beginning of each 
of the prefinal words, followed by small rises and then a return to low towards 


402 https://osf.io/dmf6éu/ 
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tsakaspacka kien 


get dark-TTER-MID-ETER-5 no-CONT healcINE-DES E arrve-SUBID-3 


Figure 175: AZ23_Cuent_Q_0243* Cf. (183) for the context. 


the beginning of the next (cf. also Figure 171 in section 6.2.3 and the discussion 
there - the analytical uncertainty due to the reduced scaling is marked in (183) by 
setting the corresponding tones and phrase boundaries in grey). On the final word 
in both cases, hampikuyta and chaarnin, pitch events are unambigously identifi- 
able. Whether we analyze individual PhPs on each word or not, prosodically the 
two prefinal words in the larger phrases are clearly cued to be subordinate to the 
final one via local pitch scaling. Thus in both cases all three words are part of a 
larger unit with final prominence (parallel to the findings in section 6.2.3). The 
pattern of a pronounced rise-fall on the final word, dwarfing any preceding pitch 
movement in terms of local scaling, is found in the majority of longer utterances 
in this Cuento corpus.*?* Often, as here, the prefinal phrases in the larger unit are 
rising, if identifiable, and the final one is rising-falling. Since the context clearly 
suggests broad focus, it is assumed that such a larger unit with a single promi- 
nence profile can in fact be made up of PhPs with different tonal contours, as here. 
Such sequences were classed as indeterminable with regards to their prominence 
profile in section 6.2.3.3. 

This global contour shape cueing rightmost prominence across a large unit 
seems unaffected by syntactic differences in the sentences corresponding to it, 
but some of these differences are nonetheless reflected in the prosody. In Figure 
175 the first verb is syntactically the main verb, the second, chaa-r-ni-n, is marked 


403 https://osf.io/9h8u3/ 
404 Cf. section 5.1.3.1, where very similar context conditions were argued to favour the Spanish 
phrase accentuation, which is also formally similar to the broad focus contours discussed here. 
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morphologically as subordinate by the morpheme =r. They are phrased sepa- 
rately, and their subordinating relationship is likely reflected in the downstep 
(in terms of pitch level) between the first and second verb phrase in Figure 175, 
which does not occur in Figure 174. Yet the syntactic structure of the sentence in 
Figure 175 does not result in a global contour signaling prefinal prominence (on 
the main verb): the rising contour shape on the main verb phrase suggests con- 
tinuation, and even though it is scaled higher in terms of pitch level than the final 
subordinate verb phrase, it is the latter which is realized with the final rise-fall, 
also scaled larger in terms of local pitch span. Highest prominence is therefore 
still final in the entire utterance in Figure 175. This suggests that different scaling 
cues (pitch level compared between phrases vs. pitch span on local pitch events) 
can signal different structures: some phrasing matching relevant internal syn- 
tactic partitions while at the same time signaling no internal IS-partition (-broad 
focus). We will see this again in the next examples. For global contour shape, 
word order also does not seem to matter. In Figure 174 the final word is the direct 
object, while in Figure 175 it is the verb, yet the contour is the same and context 
suggests broad focus in both cases. In Figure 174, the utterance does not map 
onto one sentence, but on the continuation of one, plus another. (183) shows that 
its first part is the completion of a sentence begun at 10.4: unayshi kanaa huk... 
uhm... hampikuq runa “a long time ago there wasa... uhm... healer person", 
with a filled pause occurring between the numeral huk and the noun it modifies. 
The second part, allaapa yachanaa hampikuyta, could be a full sentence with a 
null pronominal subject: “[s/he] knew a lot about healing”. Prosodically however, 
the two parts of Figure 174, hampikuq runa and allaapa yachanaa hampikuyta 
are more closely fitted together than its first part is with the preceding unayshi 
kanaa huk, from which it is separated by a mostly filled pause of more than 500 
miliseconds (likely signaling continuation, as does the rising contour on unayshi 
kanaa huk). Hampikuq runa is realized with a rising contour, also indicating con- 
tinuation, and separated by only a very short break (90 miliseconds) from allaapa 
yachanaa hampikuyta. The realization of Figure 174 as a single utterance suggests 
a reading of its second part as a relative clause with the first part as head noun. 
Syntax and prosody thus seem to be able to contribute independently to discourse 
cohesion. 
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6.4.1.2 Second Cuento sequence: Syntactic and prosodic subordination 
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(184) AZ23 Cuent Q 0274—0525*? (context for Figures 176 and 177, continuing 


right after (183)) 

time AZ23 

(seconds) 

27.4 pause 

28.2 (L H) 
tsay-chaw (0.7) 
DEM.DIST-LOC 
then 

29.8 (L H-------- XL | H--— ) 


32.0 


32.5 


35.2 


36.4 


38.8 


ala-tsi-ku-n mantsa-ku-n 
be.cold-CAUS-MID-3 be.afraid-MID-3 
he is cold, he is afraid 


tsiqtsi-kuna-si yuri-pu-ski-n 
bat-PL-ADD appear-DIR-ITER-3 

even bats appear 

(L H L) 
ala-tsi-ku-yka-pti-n 
be.cold-CAUS-MID-PROG-SUBDIFF-3 
[while] he is getting cold 

[L H J(L H----- )] 
mallaqa-y-ni-n tsari-ski-n 
be.hungry-INF-FON-3 grasp-ITER-3 
hunger grasps him [- he gets hungry] 
(L  H------------------------—--— XL H L) 
tsay apa-naa chugllu-ta-wan aytsa-ta 
DEM.DIST bring-PST.REP corn-OBJ-INST meat-OBJ 
he had brought corn and meat 


405 https://osf.io/s9kqr/ 
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41.6 


444 


47.2 


47.7 


51.1 


(L HL)(LHL)(L H L) 

almuersu-n na-n mirienda-n-ta 

lunch-3 PSSP-3 snack-3-OBJ 

his lunch, thing, his snack 

(L H————- XL H----)(L H------------- ) 
tsay-chaw-na-m allaapa mallaqa-ski-n-lla-raa 
DEM.DIST-LOC-DISC-ASS much be.hungry-ITER-3-LIM-CONT 


then he is still very hungry 

L 

y 

and 

[H L)(H L) (L H D] 


hoqta chuspi-ta tsari-ski-r miku-ski-n 

six fly-OBJ grasp-ITER-SUBID eat-ITER-3 

he catches and eats six flies / catching six flies he eats [them] 
(L H L) 

pacha-n hunta-na-n-paq 

stomach-3 fill-NMLZ-3-BEN 

in order to fill his stomach 


bxccold-C AUS-MID-PROG-SUBDIEFE-3 bæ hungry-INE-FON- 3 affect-ITER-* 


white he is feeling cold, he bocomcs bungry 


Figure 176: AZ23 Cuent, Q 0352."** Cf. (184) for the context. 


406 https://osf.io/Aas7m/ 
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mhàin 


six fy- OBJ catch-FTER-SUBID cal ITER «3 


s sx Hes 


Figure 177: AZ23_Cuent_Q_0477.*” Cf. (184) for the context. 


This section further discusses utterances that exhibit some internal phrasing at the 
level of individual PhPs, while still also cueing broad focus overall. Some of this 
internal phrasing corresponds to boundaries between high-level syntactic divi- 
sions. The utterances in both Figures 176 and 177 are of sentences that contain 
two verb phrases, a main verb and a subordinated one each. In Figure 176 the 
subordinated verb phrase is alatsikuykaptin „[while]** he is getting cold“, and the 
main one is mallaqaynin tsariskin *he gets hungry" (lit. *his hunger grasps [him]"). 
Here, alatsikuykaptin clearly forms its own rise-falling phrase, with the final low 
tone reaching a low target early on the final syllable [tin] (cf. the rising contour on 
tsakapakuskin in Figure 175 where the low target is only reached with the onset 
of the next word). Mallaqaynin tsariskin is realized as two rising phrases, with 
the second, on tsariskin, upstepped: the H tone in the second phrase reaches far 
higher than the one in the first, and the initial L of the second phrase is also just 
a little lower than the H on the first and far higher than that phrase's initial L. It 
is clear that the boundary between alatsikuykaptin and mallaqaynin tsariskin is 
stronger than that between the phrases on mallaqaynin and tsariskin. This suggests 
the latter are part of one larger prosodic unit which contains both PhPs formed on 


407 https://osf.io/wur83/ 

408 The semantic relation between the action conveyed by the main verb and that conveyed by the 
verb marked as subordinated by -pti is underspecified: it can be interpreted as temporal (consecu- 
tive or contemporaneous), causal, conditional etc. depending on the context (Parker 1976: 143-144, 
160). Parker (1976: 160) states that what they express is that the action or state denoted by the sub- 
ordinated clause is a *prerequisite" for that denoted by the main clause. I adopt that formulation 
in the main text. 
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the two words, but not that on alatsikuykaptin, like A(BC) or (A(BC)). The high-level 
separation between alatsikuykaptin, on the one hand, and mallaqaynin tsariskin, 
on the other, reflects the fact that these form two separate propositions, “he got 
cold”, and “he got hungry”, both partial answers to the superordinate QUD WHAT 
HAPPENED ONCE IT GOT DARK? (answered by the entire sequence (184)), with “he got 
cold” already having been asserted at 29.8 and now serving as a prerequisite for 
“he got hungry”. Alatsikuykaptin here establishes a reference back to the scene set 
with the previous utterances (at 29.8-32.5). The separation is also reflected in the 
syntax: -pti in alatsikuykaptin marks a subordinated verb with a different subject 
from that of the main clause (Parker 1976: 143). Its subject is not the same as that of 
the main verb tsariskin “s/he grasps”, whose subject is the preceding mallaqaynin 
“his hunger”, with only the possesive marker -n (3"* persons singular) coreferential 
with the protagonist. In the following utterance at 38.8, the healer is reactivated as 
subject with the demonstrative tsay instead of a null pronoun, confirming that the 
subject change took place here. 

The final rise on the utterance indicates thematic continuation in the sense that 
the superordinate QUD WHAT HAPPENED ONCE IT GOT DARK? is not yet fully answered: 
the issues of hunger and lack of food are continued. The utterances at 38.8 and 41.6 
provide answers to the subordinate QUD WHY WAS HE HUNGRY?, with 38.8 answer- 
ing another subordinate QUD about how the healer had brought corn and meat 
(HAD HE BROUGHT SOMETHING TO EAT?), but that this was for lunch (41.6, answering 
subordinate QUDs along the lines of DID HE HAVE ANY LEFT? or WAS IT ENOUGH FOR 
DINNER, TOO? by implicature*9?), These two utterances each completely answer their 
respective QUDs as well as together answering the QUD WHY WAS HE HUNGRY?, and 
are produced with rise-falling contours. The fact that they provide a background to 
the superordinate QUD of WHAT HAPPENED ONCE IT GOT DARK AND HE GOT COLD AND 
HUNGRY is then brought back to the fore in 44.4, which repeats that he was hungry. 


409 How to best analyze implicit QUDs in this sequence is a tricky question which cannot be fully 
treated here. To give an example, strictly speaking, according to the principles laid out in Riester et 
al. (2018); Riester & Shiohara (2018); Riester (2019), at 38.8 the implicit QUD would have to be WHAT 
DID HE DO?, because neither the verb apanaa ‘he had brought nor its objects chuqllutawan aytsata 
“corn and meat are previously mentioned in the context. However, the previous assertion that the 
healer got hungry arguably evokes the question in a rational interlocutor whether a competent 
healer setting out on a long journey through uninhabitated country did not bring any provisions 
with him. In such a context, the action denoted by the verb apanaa would be contextually given, 
making the implicit QUD something like WHAT HAD HE BROUGHT? or even HAD HE BROUGHT ANY 
FOOD?. Possibly, this is also cued by the objects being postverbal, because then word order is com- 
patible with a prosodic realization cueing final prominence (on the food items), unlike a verb-final 
one. Future research will have to show how it is possible to include such deliberations into the 
generation of implicit QUDs in a principled way. 
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The morpheme -na in tsaychawnam mallaqaskinllaraa indicates a kind of topic 
change (Leonel Menacho and Gabriel Barreto, p.c.), and the utterance is realized 
with rising contours, suggesting that a further partial answer to the superordinate 
QUD is still to be given. This happens at 47.7/Figure 177. There, huqta and chuspita 
likely form only-falling phrases each (more clearly on huqta). The falling contours 
possibly serve to highlight the six flies as new referents that are important in that 
they will play a crucial role in the story later, while at the same time keeping the 
phrases formed on huqta chuspita prosodically subordinate to the larger unit as 
signaled by the reduced local pitch span on them compared to that on the final word 
mikuskin. The subordinated verb tsariskir and the main verb mikuskin directly fol- 
lowing it are realized with a single rise-falling phrase with the subordinated verb 
realized only on the initial low stretch. In terms of local pitch span, the final pitch 
movement on mikuskin is larger than all the preceding ones in the utterance, serving 
to cue that not huqta chuspita alone, although realized with falling contours, but the 
complex proposition conveyed by the entire utterance, is the answer to the QUD 
(HE GOT HUNGRY AND?). In Figure 177, the subordinated verb and the main verb are 
phrased together, while in Figure 176, they were phrased apart. This fits the tighter 
syntactic relationship the two verbs have here: the subordinating morpheme -r in 
tsariskir indicates subordination with the same subject as the main verb mikuskin, 
in contrast to —pti used before. The subordinate verb and the main verb here not 
only share the same subject, the null pronoun coreferential with the protagonist, 
but also the object, the six flies (huqta chuspita). Word order here is the unmarked 
OV.*'? The Quechua utterance arguably achieves an effect of cueing the object as 
new while at the same time signaling a tight relationship between the two verbal 
actions (so that they can be interpreted as conveying a single complex proposition) 
and broad focus overall, something which no single English translation can do. 
The two options given in (184) each only achieve some of these properties, either 
unmarked word order or the subordinating relation between the two verbs. Mor- 
phosyntax and prosodic phrasing here align to cue that the utterances in Figures 176 
and 177 have different information structures: in Figure 176, the two verb phrases 
denote two separate propositions, with the one denoted by the main verb phrase 
mallaqaynin tsariskin asserted and the one denoted by alatsikuykaptin serving as a 
prerequisite (having been previously asserted). In Figure 177, in contrast, a single 
proposition is asserted composed of two verbal actions (grasping and eating). 
These individual examples support the hypothesis from section 6.2.3.3 that 
larger prosodic groupings exists which can include several PhPs as identifiable 
from their tonal make-up. The extension of this unit is signaled via local pitch span 


410 Cf. sections 6.4.2.3 and 6.4.3 for discussions on which word orders are unmarked. 
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differences: relatively smaller or reduced pitch span was observed on prefinal PhPs 
in such units, while the final PhP had the greatest pitch span. Such groups of PhPs 
with greatest pitch span in final position all occurred in contexts with compara- 
tively broad focus, supporting the hypothesis that they form larger units on which 
prominence is final by default. The groupings created by prosodic phrasing, both at 
this higher level and at that of individual PhPs, interact with groupings created by 
morphosyntax, either to mutually reinforce when aligning, or to convey additional 
grouping information when not aligning. Tonal contour shape of individual PhPs 
(rising/falling) seems also able to cue distinctions aiding discourse coherence inde- 
pendent of focus, suggesting that global phrasing cued by relative differences in 
local pitch scaling (at a level above individual PhPs), local contour shape, and local 
phrasing (at the level of PhPs) each may contribute different types of information. 


6.4.2 Complex contexts and marked prosody in Maptask 


In the following, context examples from Maptask will be examined, with quite dif- 
ferent interactional dynamics and resulting prosody. Instead of long, uninterrupted 
narrative passages produced by one speaker, there is much more back-and-forth 
between the speakers, negotiating CG increments via clarification requests and 
specifications of how a proposition relates to elements already in CG. We will see 
how pitch scaling plays a role in cueing IS-partitions within larger prosodic units. 


6.4.2.1 First sequence: Initial and final backgrounded constituents 

In the first sequence, KP04 (with the path on the map) and TP03 (without it) have 
already been playing the maptask for about three and a half minutes. They pro- 
ceeded without problems until coming upon the spot where the maps differ (in the 
upper part ofthe map, the skunk and the lightning have swapped places, cf. Figures 
188 and 189 in Appendix B). Confusion ensues and they realize that their maps 
are different. They change strategies: instead of KP04 leading the conversation by 
giving descriptions of the path for TP03 to follow, he now asks TP03 a series of ques- 
tions about the layout of this part of his map, trying to bring the landmarks, mostly 
known by now, into their correct relative positions. Arguably, the superordinate 
QUD that best represents this strategy has three open variables, two location varia- 
bles and one path variable, as in FROM WHICH LANDMARK TO WHICH LANDMARK DOES 
THE PATH GO WHICH WAY?. This is itself a subquestion to the more general WHAT IS 
THE CORRECT PATH? guiding the game overall. The speakers begin pursuing this new 
strategy starting from the lightning, which they identified as one of the landmarks 
that has a divergent location. This is where (185) takes off: 
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(185) TPO3&KP04_MT_Q_2010-2216*"* (context for Figures 178 and 179) 


time 
(seconds) 
201.0 


202.4 


205.8 
208.0 


209.2 


211.5 


213.5 


214.5 


215.1 


215.2 


KP04 (with the path on the map)  TP03 (without the path on the map) 


y tsiqtsi-qa 

and bat-TOP 

and the bat 
(L H- (LH Ron i) (L Hi) 
tsiqtsi-qa manka este tsay na-lla-chaw-mi*!? este tsay manka-pa 


bat-TOP pot uhm DEM.DIST PSSP-LIM-LOC-ASS uhm DEM.DIST pot-GEN 


the bat is just to the thing of the pot, of that pot 

(2.2) 
(L H----------- L)!(H L) 
huk costado-chaw ka-yka-n 
one side-LOC COP-PROG-3 
it is on the other side 

(L H---) !(L H-----)(H---------- L)K(H-——L ) 

tsawra-qa entonces kay-naw-mi rura-shun 

then-TOP then DEM.PROX-SIMIL-ASS do-1.PL.FUT 

so then this is what we'll do 


(L H-—--) 

tsay tillakuq 

DEM.DIST lightning 

that lightning 

(L H ———— L (LH) 


tillaku-yaq-cha aywa-nki aw 

lightning-TERM-ASS go-2 right 

you go to the lightning right 
(LH) 
ajá 
(L H)(LH L )!(H—— L ) 
tillakuq] hawa-n-pa-m (?aywa-yka-n) 
lightning below-3-GEN-ASS (?ir-PROG-3) 
(?it's going) below the lightning 

(L H) 

[tillaku] 


411 https://osf.io/pcu9b/ 


412 The morpheme -miis here attached to a semantically empty predicate: na-lla-chaw-mi, where 
na- is a placeholder morpheme speakers use when for whatever reason (lexical retrieval problems 
etc.) they cannot say the actual word they mean (Leonel Menacho, p.c.). Thus na-lla-chaw-mi is an 
assertion that something (manka 'the pot") is just at the location that is specified by something the 


speaker cannot at the moment name. 


— 469 
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216.5 (L H—— ) 
tillaku-pa uh 
lightning-GEN 
the lightning uh 
217.1 (L----H--------------- ) (L H L) (LH L) 


tilla punku-pita may-chaw-tan manka 
lightning front-ABL where-LOC-DETVAR pot 
from in front of the lightning, where is the pot 
219.5 (L H )(L HM L(LH L) 
tillaku-pa kay-naw hawa-n-man-pa-chaw-chi manka 
lightning-GEN DEM.PROX-SIMIL below-3-DEST-GEN-LOC-CONJ pot 
from the lightning, it's like this in towards below it, the pot 


tillakupa tilla 


Figure 178: KP04 MT. Q 2165.4"? Cf. (185) for the context. 


At 217.1, KP04 asks where the pot (manka) is from in front of the lightning 
(tilla(kuq)). Both objects are given referentially from the preceding discussion 
about their locations. However, while tillakuq is active from the preceding turn, 
manka was last mentioned at 202.4, nine turns and 13 seconds previously. At 217.1, 
tilla punkupita is a topic in the sense that it specifies the first location variable in the 
superordinate QUD FROM WHICH LANDMARK TO WHICH LANDMARK DOES THE PATH GO 
WHICH WAY?. It referentially restricts the open proposition denoted by the wh-ques- 
tion. Although the lightning is active, tilla punkupita is here realized fully probably 
because it changes roles: in KP04’s turn at 213.5 it was the goal, but is now the 
origin. This topic is the leftmost constituent in the utterance, realized with a single 
rising phrase. The final constituent manka is fully realized because the pot is not 


413 https://osf.io/ptjnu/ 
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active enough in the directly preceding context: had the previous utterances been 
about it as subject, maychawtan would have been contextually specific enough as a 
question. However, as it is, tsiqtsi “the bat" might have been another possible can- 
didate in the context. Thus manka here specifies the second locational variable in 
the super-QUD, but it is also referentially given. The wh-constituent may-chaw-tan 
is in focus, asking for the specification of the path variable relative to lightning and 
pot in the super-QUD. Context leaves ambiguous whether the pot is only part ofthe 
question background, or separately specified, i.e. a second topic.*!^ Prosody sepa- 
rates the wh-word maychawtan from the final manka, realizing both with a falling 
contour each. Manka is downstepped in terms of pitch level from maychawtan, but 
still clearly realizes its own contour. The initial topic is realized with a separated 
rising phrase. In response, TP03 produces a declarative (Figure 179) with similar 
syntax and prosody at 219.5. 

In his answer, the lightning, here tillakupa, is again realized first on a separate 
phrase with a rise. As in the question, it functions as a topic here, restricting the 
reference of the at-issue part of the proposition. The at-issue part, corresponding to 
the wh-constituent in the question, is expressed by the locative noun phrase kaynaw 
hawanmanpachawchi^? “like this in towards below", realized as one rise-falling 


414 This difference could perhaps best be described by refering to the kind of hierarchical QUD- 
structure the question implies. I suggest that it is the difference between two structures with differ- 
ing depths. In the first with three question nodes, the super-QUD FROM WHICH LANDMARK TO WHICH 
LANDMARK DOES THE PATH GO WHICH WAY? leads on to a first subordinate question, FROM WHICH 
LANDMARK TO THE POT DOES THE PATH GO WHICH WAY?, and only then to the one actually asked, FROM 
THE LIGHTNING TO THE POT DOES THE PATH GO WHICH WAY?. In the second with only two question 
nodes, both landmark variables are filled at the same time in only one subordinate question. The 
former results in a question in which the pot is only backgrounded, while it is a second topic in the 
latter. In English, this difference would arguably be cued by the position of the strongest promi- 
nence (in capitals) in the corresponding question: 


(i) A. From the lightning, WHERE is the pot? ['pot' - backgrounded] 
B. From the lightning, where is the POT? [“pot = second topic] 


In both questions, what is at-issue/in focus is only the part denoted by the wh-element. 

415 Thelocation-denoting noun hawa is here marked with not just one, but three, locative suffixes: 
-man (glossed as DEST, marking a destination for a path), -pa (glossed as GEN, 'genitive', often also 
marking a location along which a path passes), and -chaw (glossed as LOC, marking a static loca- 
tion). Such multiple marking is not frequent in our corpora, but similar cases are also reported in 
Parker (1976: 86-87). The translation tries to reproduce this with the sequence of prepositions. It's 
possible that this proliferation of locative descriptors is due to TP03 struggling a bit to find the right 
formulation for the actual spatial layout (note the use of kaynaw ‘like this’ and his continued efforts 
to describe the location of the objects in subsequent turns): on his map, lightning, bat and pot form 
an isosceles upside-down triangle, with lightning and bat at the left and right top corners and the 
pot at the bottom corner. Thus, the pot is both to the right and below the lightning. 
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tillakupa layne haw mnnm chi 


hilakw.pa kav-naw bawa-e-man-pa-chaw-chr manka 


lightnug-GEN DEM PROX-SIMIL below-3-DAT-GEN-LOC-CON] pot 


Figure 179: TP03 MT Q 2195."'6 Cf. (185) for the context. 


phrase. This constituent is marked with the conjectural evidential —chi, one of the 
evidentials said to attach to focused constituents in Southern varieties of Quechua 
(cf. Muysken 1995; Sánchez 2010; Muntendam 2015). Morphosyntactic marking 
here aligns with prosody and context to mark the focus domain, but we will see 
in subsequent examples that this is not always the case. The final constituent is 
again manka. It is also realized via its own phrase, this time not downstepped. It 
is a resumptive topic here, specifying the second referent about which the at-issue 
predication has been made even though this relation is already evident from the 
context. Even though manka is not downstepped as in the preceding wh-question, 
it is notable that the overall contour of these two utterances is clearly distinct from 
the broad focus utterances in Cuento. Here, context and morphosyntax indicate 
an IS-partition with narrow focus on the locational constituent, and this is cued 
prosodically. On the one hand, via the placement of a right phrase boundary of 
a falling phrase after the focal constituent, and on the other via a global contour 
whose scaling does not cue final main prominence and thus broad focus: the peak 
on the locational constituents is not scaled lower than that on the final constituent, 
manka. 


6.4.2.2 Second sequence: Differential interpretations for initial elements 

This section demonstrates that the same initial element can receive different inter- 
pretations in two utterances, even though local prosody and word order are very 
similar. It also demonstrates how verbfinal sentences do not exhibit the rise-falling 


416 https://osf.io/8v7rn/ 
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contour cueing rightmost prominence if the context is not compatible with broad 
focus, as in Cuento. 


(186) TP03 & KP04 MT Q 2221—2329*"' (context for Figure 180; directly continued 


after (185)) 


time 
(seconds) 
2224 


223.7 


226.0 


2274 
227.6 


229.8 


230.5 


231.8 


KP04 (with the path on the TP03 (without the path on the 
map) map) 

L H 

tsay-pa- 

DEM.DIST-GEN 

there 

(L H--------------- L) I(H------------- L ) 

tsay-pita may-pa-tan tsiqtsi ka-yka-n 

DEM.DIST-ABL where-GEN-DETVAR bat COP-PROG-3 

from there where is the bat 


tsiqtsi-qa- 
bat-TOP 
the bat 
(0.5) 
(L H---------------- L )L H) 
tsay na ka-q-chaw este tillakuq 
DEM.DIST PSSP COP-AG-LOC that lightning 
where that thing is, the lightning 


(HL) 
ya 
(L H--)(LH L H L ) 
ya manka y tsiqtsi-m ka-yka-n 
ok pot and bat-ASS COP-PROG-3 
yes the pot is there with the bat 
(L H)( H----------------- L) !(H--=-L )(L  H——— L ) 
manka chawpi-chaw-mi ka-yka-n [y hawa-n-kuna-chaw 
pot middle-LOC-ASS COP-PROG-3 below-3P-PL-LOC 
the pot is in their middle and below them 


KP04 continues to ask about the location of the other objects. At 223.7, he asks where 
the bat is. In response, TP03 produces two turns (226.0 — 227.6) separated by a pause, 


417 https://osf.io/bg78v/ 
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asserting that the bat is where the lightning is. The overall sequence is here again one 
of topic followed by at-issue/focal content, which in turn is followed by a postposed 
NP. The first turn consisting only of the constituent tsiqtsi-qa is marked morphologi- 
cally as topic but not produced with a rise. The focused constituent tsay na kaqchaw 
“where that thing is” is right-aligned with a rise-falling phrase. The postposed replace- 
ment for na, tillakuq “the lightning”, is produced with a rise. KP04’s response here is 
only via an acknowledgement token (229.8). TP03 seems to agree that the assertion he 
has just made is not sufficient as answer to KP04’s question given the purposes of the 
conversation, because he continues with an elaboration (230.5—231.8, cf. Figure 180). 


st and bat-ASS COP-PROG-3 pot miðdle-LOC-ASS COP-PROG-3 


yes the pot is there with the bat the pot es in the middie 


Figure 180: TP03 MT. Q 2305."'? Cf. (186) for the context. 


These two utterances both initially realize manka with a rising phrase, followed by a 
further constituent morphologically marked with —m(i), and realized with a falling 
phrase. Then follows the existential verb kaykan, realized with a downstepped 
falling phrase. Both of these are best understood as answering the QUD WHAT IS 
THERE?, with “there” the location the discourse has progressed to previously,*?? i.e. 
where the lightning (tillakuq) is and where the bat (tsiqtsi) has been asserted to be. 
That this is the appropriate implicit QUD is signaled by the presence of the verbs 
of being. The verb of being ka- is omitted in the 3'* person present when it is used 


418 https://osf.io/ajm2n/ 

419 As also noted in section 5.1.3 on Spanish, in the Maptask, the progression from landmark to land- 
mark along the path on the map in the conversation of the participants is parallel to the progression 
the speakers make in the discourse via the negotiation of how the context set can be reduced. The 
landmarks and the path the speakers have successfully agreed on are, quite literally, their common 
ground. 
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as a copula, but not when used existentially, or when it serves as a locative copula, 
similar to Spanish hay (cf. Parker 1976: 68). This use is frequently expressed via the 
progressive form, as here. As part of the background of the QUD, the verbs are also 
not part of the at-issue content ofthese two utterances that answer it. Since the pre- 
vious context has established that at the location under discussion, the bat (tsigtsi) 
and the lightning (tillakuq) are there, what is new in the first utterance (230.5) is 
the information that the pot (manka) is also there, and it is asserted that it is there 
together with the bat, while the bat being there on its own is presupposed. This 
difference in information status is likely responsible for the prosodic realization of 
these two elements. Ya manka is realized in a separate rising phrase, like an initial 
topic, although it quite clearly isn't one here - the pot is the one referent that hasn't 
already been asserted to be in the location they are discussing. Y tsiqtsim is real- 
ized in a separate rise-falling phrase, but downstepped relative to the rising phrase 
on ya manka. This realization, instead of one e.g. in which the entire sequence is 
realized with a finally prominent rise-falling phrase (as seen in Cuento, appropri- 
ate for a context in which none of the referents are given or presupposed to be at 
the location), reflects the context conditions outlined above, which the translation 
also attempts to recreate: *yes the pot is there with the bat", instead of e.g. *yes 
the pot and the bat are there". Tsiqtsim and kaykan are both downstepped, but 
there are additional cues that their information structure differs: in both cases, 
the downstep in pitch level can be taken to cue that the element to the left of the 
downstep is more prominent. In the case of ya manka and y tsiqtsim, in addition, 
the phrase on the first element is rising, and the second is marked with —m(i).’”° 
Assuming leftwards scope for —m(i) (in accordance with Muysken 1995; Sánchez 
2010), this would support the interpretation that the new information that the pot 
is there with the bat is asserted. Yet the referents forming this proposition differ 
in status: that the pot is there with the bat is at-issue, but the entailed proposi- 
tion that the pot is there is new, while that of the bat being there is presupposed. 


420 Marking a constituent with one of the evidentials -m(i)/-ch(i)-sh(i)-cha is thought to indicate 
focus on that constituent (Muysken 1995; Sánchez 2010). Yet see note 84 in section 3.7.3.2 as well as 
the analysis in section 6.4.3 for problems with this account. Beyond this syntagmatic function, this 
work does not discuss the paradigmatic semantic contribution of the different evidentials besides 
the very broad characterization that -m(i) and -cha mark information as either asserted or based 
on “best possible grounds" Faller (2002, 2003), -ch(i) marks information as based on conjecture 
and —sh(i) as based on hearsay. In work on southern Quechua varieties, a consensus seems to have 
been reached that the meaning contributed by the evidentials falls under the broad category of 
non-at-issue meaning (similar to that of discourse particles and intonation, cf. Zimmermann 2011; 
Truckenbrodt 2012), but whether it is a conventional implicature (Roberts 2017) or a speech act 
operator (Faller 2014) seems undecided. Hintz & Hintz (2017) pursue a somewhat different route 
but are specifically concerned with the evidentials of South Conchucos Quechua. 
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Between tsiqtsim and kaykan, on the other hand, the metrical relation cued is also 
strong-weak, but the left element tsiqtsim here is realized with a falling phrase and 
marked with —m(i), both used to cue the right edge of focal material. This aids the 
contextual interpretation that kaykan is only backgrounded. 

In the second utterance (231.8), manka chawpichawmi kaykan, manka is also 
realized with a rise, but the following —mi-marked locative noun is not down- 
stepped. Here, manka being realized with a separate rising phrase aids in its inter- 
pretation as topical subject. Having been copular subject in the preceding utter- 
ance, the pot is a prime topic candidate already, so that in a comparable sequence, 
manka could probably be omitted and the utterance still be understood to be about 
the pot. However, here there are three highly salient referents in the discourse: the 
pot, the bat and the lightning, and the guiding QUD for the entire section of the con- 
versation is how they are located relative to each other. Under these circumstances, 
itis more cooperative of the speaker to disambiguate the topic referent instead of 
omitting it. If manka chawpichawmi were here realized as a single rise-fall with 
the rise only occurring on the final element, like in the broad focus utterances in 
Cuento, the sequence would be ambiguous. As illustrated in (148) in section 6.2.3, it 
allows two interpretations, one with manka as modifier of chawpi (*x is the middle 
of the pot") and another with it as copular subject (“the pot is in the middle"). 
I argue that a realization as a single rise-falling phrase with final prominence like 
in Cuento would favour the former reading, or at least not disfavour it, whereas a 
realization as here, with two separate phrases and the first not metrically subordi- 
nate to the second, is more compatible with the latter reading. The latter reading 
is undoubtedly correct in the context, because the third utterance here (see (185)), 
y hawankunachaw *and below them", with -n-kuna marking possession by a plural 
possessor, clarifies that chawpi “the middle" is also to be related to this plural ref- 
erent, the two other salient entities in the context, the bat and the lightning. This 
is further confirmed by looking at TP03's map, on which no objects are inside the 
pot, but where the three objects are located exactly as he describes: pot, bat and 
lightning form an upside-down triangle, with the pot at the bottom, so that it is both 
between and below the other two. 

Comparing the two utterances produced at 230.5 and 231.8, we have seen that 
despite initial similarities, they are quite different in their information structure, 
especially regarding the role of the initial manka *the pot". In the first utterance, 
its presence at the location is part of what is asserted, while in the second, it is 
topical, forming part of the background but selecting between elements competing 
in the context. Although this element by itself is realized in both cases with a rising 
phrase, I argue that the differences in tonal scaling between the elements in the two 
utterances act together with those in morphosyntax and the presence/absence of 
the conjunction to signal these differential interpretations. 
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6.4.2.3 A positional template for information structural roles? 

The observations here and on Cuento uncontroversially support the hypothesis 
of right-alignment of a (narrowly) focused constituent with the right boundary of 
a phrase, specifically a falling one. However, in order to better understand how 
prosody interacts with word order and especially the role of the separately phrased 
initial and final constituents I’ve been calling topics here, I will discuss and build on 
two proposals in the literature on Quechua about the relative position of elements 
in a sentence and their information structural roles. 

For Huallaga Quechua (also Quechua I), Weber (1989: 427—435) proposes that 
aninformation packaging pattern for sentences exists in which zero, one, or several 
-qa-marked constituents are initial, followed by a number (zero also possible) of 
preverbal constituents plus the verb, of which one is evidential-marked, and then 
optionally followed by further -qa-marked constituents. The function of -qa is to 
mark elements that relate a sentence to its context, and compared to the notion 
of topic (cf. Weber 1989: chapter 20). The preverbal material including the evi- 
dential-marked constituent is broadly linked to focus (Weber 1989: 419, 428—431). 
Sánchez (2010) builds upon this proposal for southern varieties of Quechua II in a 
generative framework. In her account, fronted constituents in the “left periphery" 
of a sentence can be either topical or focal and then require marking with —qa or 
the evidentials, respectively. Postverbal constituents in the “right periphery" can 
only be topical, never focal, are not always marked with —qa, and are sometimes 
realized with “low pitch and voiceless vowels typically associated with breathy 
voice” (Sánchez 2010: 38-39). Elements in the left periphery can be contrastive 
(Sánchez 2010: 29), but not the postverbal topics (Sánchez 2010: 51). However, the 
elements at the right edge are also said to *disambiguate between potentially com- 
peting topics" (Sánchez 2010: 175), relativizing the claim about an absence of con- 
trast. I take the two proposals to capture an important observation about a rela- 
tion between relative positions and information structural roles, with elements 
in a central position conveying the *main" or at-issue content, and those in the 
periphery conveying instructions about how to relate that content to the ongoing 
discourse, very broadly speaking, and in agreement with other observations that 
forms encoding more discourse-related meanings seem to occupy peripheral posi- 
tions in larger units universally (e.g. Zimmermann 2011: 2034 with respect to the 
clause). Yet our observations suggest that for our data, some of the generaliza- 
tions do not hold. Sánchez (2010: 12-13) states that canonical word order corre- 
sponding to broad focus in Cuzco Quechua is word-final both in transitive and 
intransitive sentences, yet examples like AZ23 Cuent Q 0126 ((183), Figure 174) 
show that in our data, object-final utterances can also be compatible with a broad 
focus reading and be realized with a contour cueing final prominence. Examples 
like Figures 178 and 179 also show that initial constituents (fronted according to 
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Sánchez 2010: 29-31) not marked with -qa can still be inferred to be topical from 
the context. It seems unclear how best to define the units that determine centrality 
and periphery here. 

I would like to reframe the proposals by Weber (1989) and Sánchez (2010) 
somewhat, as a kind of template that serves as a heuristic default, presumably 
based on the frequent experience of pairings of constituent order and context, and 
involved in the generation of expectations, like metrical structure (see sections 
3.3.1, 3.7.3). While Weber (1989) and Sánchez (2010) mainly frame their proposals 
as being about morphological marking and word order within sentences, I suggest 
that it can apply to utterances or even sequences of turns, instead of individual 
sentences, and that prosody also contributes to cueing the position of some element 
relative to this template. That is to say, I propose that this is a structure based on the 
typical sequencing of information when it is negotiated in interaction (cf. Wiltschko 
2021). I do not mean to say at all that the proposals in terms of morphosysntax are 
wrong, but that prosodic and morphosyntactic cues to discourse and information 
structure can be complementary or even in conflict with each other and that they 
all need to be taken into account to really arrive at an apt IS-interpretation. This is 
based on the broader idea that cues to IS are not categorical, but distributional or 
probabilistic. 

Taking what the two proposals have in common, I add suggestions on how 
observations about prosody and information structure can be formulated with ref- 
erence to the template, assumed to have (at least) three positions: two peripheral 
positions for elements that could be said to be somehow topical (given or active in 
the context and filling a variable in the referential specification connecting the prop- 
osition being asserted to its discourse context), an initial and a final one. Between 
them is a position on which material is realized that is at-issue. We could also go on 
and describe another circumferential position, even more initial or final, in which 
response particles or tags such as aw are located, but I will leave this to the side 
here.*? I suggest that prosody cues the position of material relative to the template 


421 Cf. the proposal of an “interactional spine" by Wiltschko (2021). She proposes that languages 
have a grammaticalized structure derived from generalizations about how conversations normally 
proceed, which contains two layers concerned with how information is negotiated in interaction, 
the responding and the grounding layer, and one layer concerned with the actual information- 
al content, the propositional layer (Wiltschko 2021: 72-92). Her grounding layer can be roughly 
equated with the two peripheral non-at-issue positions in the template I propose, and the proposi- 
tionallayer with the central at-issue position. The utmost peripheral positions of e.g. response par- 
ticles and tags would correspond to the interactional layer. Importantly, she also does not assume 
that the maximal unit for these domains of grammar is the sentence, but that it can extend to turn 
sequences. The general idea that ‘preposed’ and ‘postposed’, i.e. peripheral, elements somehow 
fulfill more discourse-related functions than their more syntactically integrated counterparts at a 
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in three ways. Firstly, via phrasing: a boundary between two positions in the tem- 
plate is at least a PhP-boundary (but not the other way around). Secondly, via pitch 
contour choice: material in the initial position is preferentially realized with rising 
contours, material in the central and final positions with falling contours. Thirdly, 
via metrical structure as cued by tonal scaling: the middle position has the highest 
prominence and is scaled highest. That would mean that a reduction in tonal scaling/ 
downstep on an element cues that it is not in the central position. Applying this 
template to the utterances analyzed so far predicts that those we have seen from 
Cuento mostly consist of material in the central position, which squares well with 
them having broad/final focus. The template allows to unify observations about the 
interaction between word order and syntagmatic uses of prosody, which is abso- 
lutely necessary in my opinion given that they are intimately connected via their 
essential use of sequentiality. An object in syntactically marked postverbal position 
can nonetheless be part of an assertion with broad focus, as in Figure 174, or narrow 
final focus, as in the utterance at 38.8 in (184), if it is realized as part of a rising-fall- 
ing contour with rightmost prominence. At the same time, we can say that constit- 
uents like manka in Figures 178 and 179 are in a relevant sense postposed, because 
they are phrased separately and not scaled higher than preceding elements, and that 
this correlates with them having a function that connects the at-issue proposition 
with the preceding context, even though there is no verb in these sentences relative 
to which they could be postverbal. Regarding Figure 176 from Cuento, we can now 
say that alatsikuykaptin is realized separately in the initial position, which fits its 
function of establishing a relation to the preceding discourse context for the propo- 
sition that is asserted, similar to the initial topics in the Maptask utterances.*” I also 
suggest that tonal scaling cues relative prominence also within each of these three 
positions: as seen in Cuento, a sequence of phrases in which the last one is scaled 
highest in both pitch level and span is compatible with rightmost prominence. In 
contrast, I propose that a sequence of phrases that are all downstepped relative to 
each other from left to right cues leftmost prominence. Thus the copular verbs in 


more central position in an utterance is of course at least as old as Givón (2018 [1979]). Note that 
Givón (2018 [1979]: 154) also explicitly includes intonation into what cues separated vs. integrated 
elements. 

422 Haiman (1978: 564) argues that such subordinate clauses, especially conditionals, have a very 
similar meaning to topics, because both are “givens which constitute the frame of reference with 
respect to which the main clause is either true (if a proposition), or felicitous (if not)*. He sup- 
ports this with evidence from languages in which formal marking for both is identical. Note how 
Haiman’s definition and Parker (1976: 160)'s description that such subordinates are “prerequisites” 
for the proposition denoted by the main clause are also related to the definition for semantic pre- 
suppositions by Bas van Fraasen cited in Karttunen (1973: 169), that *sentence A semantically pre- 
supposes another sentence B, just in case B is true whenever A is either true or false". 
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Figure 181 are part of the central proposition, but cued to be non-at-issue. And this 
also elucidates the difference in interpretation for the two mankas in the two utter- 
ances in Figure 181: the first is part of the assertion in central position and cued via 
scaling to be the most prominent part of that, because it is the only new addition 
in that assertion. The second, on the other hand, is in initial position and a topical 
subject. In terms of prosodic structure, the proposed model necessarily assumes a 
higher-level domain which partially determines the tonal make-up and scaling of 
the phrases it contains, but for which no separate boundary tones have been found. 
Isuggest therefore that it is simply a maximal level of the recursive PhP/AP. 

A template generating expectations as conceived in this way aids interpreta- 
tion, both when individual sequences conform to it and when they diverge from it. 
Weber (1989: 428—436) suggests a similar relationship between the relative place- 
ment of morphologically marked constituents and information structure, showing 
thatthe pattern he has identified is preferential and providing an unmarked default, 
with deviations from it being interpretable as what he calls “rhetorical devices". 
Sánchez (2010: 221—228) also observes that intonational form of both left-periph- 
eral and right-peripheral elements can vary in her data, with some of the pitch 
tracks of right-peripheral elements looking similar to what I have called downstep 
here. In her data too, intonational form varies irrespectively of whether a left- or 
right-peripheral constituent is marked with an evidential or -qa, or left unmarked, 
pointing to an independence between cues, as she notes (Sánchez 2010: 224). If 
morphology, syntax, and prosody thus each interact independently with such a 
projected preferential default, then they can also either align or misalign, which 
might result in subtle interpretable effects, perhaps highlighting different aspects 
of semantics or information structure. My main point here is that the cues to posi- 
tions should not be considered in isolation. This leads to problems when only mor- 
phosyntax is considered without prosody. But we've equally also seen that prosodic 
cues need to be interpreted together with those from other domains. Potential for 
ambiguous cueing seems to exist in particular in the two initial positions. In a sen- 
tence consisting only of a subject and a verb, in this order, the subject might be cued 
as focal due to its immediate preverbal position, and it is also potentially topical 
as an initial constituent. This ambiguity is perhaps already reflected in Sánchez 
(2010: 29, 36, 38—39)'s observation that left-peripheral elements can be either focal 
or topical, while postverbal elements are topical.*?? We will encounter similarly 
ambiguous examples below, especially in section 6.4.3. 


423 Cf. the brief discussion about fronted constituents in Spanish in section 3.7.3.1 that also sug- 
gested that there is a separately phrased initial position in which both topical and focused material 
can be found. 
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6.4.2.4 Third sequence: Cueing against context 

In this section, before moving on to data from Conc, I will discuss one further 
sequence from the Maptask that demonstrates how additional information that is 
essential for an appropriate interpretation can arise from a conflict between cues 
from prosody and morphosyntax, on the one hand, and context, on the other. 


(187) TP03 & KP04 MT Q 2329-2398^" (context for Figures 181 and 182, directly 


continued after (186)) 
time KP04 (with the path on the map)  TP03 (without the path on the 
(seconds) map) 
232.9 (LH) (HL) 
ajá o]k 
233.8 (0.3) 
234.1 (L H——-- ) 
y atuq-qa 
and fox-TOP 
and the fox 
235.5 (LH L) !(LH---------- L)  «(H-———-———-— L ) 


atuq-qa hana-chaw-mi uh na-chaw ka-yka-n 
fox-TOP above-LOC-ASS PSSP-LOC COP-PROG-3 
the fox is above there 
237.4 (L H——) 
atuq-yaq m- at- 
fox-TERM 
to the fox 
238.4 (L H-------- L)(LHL )!(H L) 
atuq-chaw-mi usha-n naani 
fox-LOC-ASS finish-3 path 
it is at the fox that the path ends 


At 232.9, KP04 accepts TP03’s description of the relative location of the items via 
acknowledgement tokens, and proceeds. He then (234.1) introduces a new referent, 
the fox, topicalizing it via —qa, and asking about it.” TP03 responds, saying that the 


424 https://osf.io/jg2z8/ 

425 He produces atuq with the topic marker —qa andin a rising phrase. We could call this a wh-ques- 
tion without wh-word or polar question without polarity (due to the absence of an inflected verb). 
But it is perhaps better seen as a question whose type is entirely determined by context. Formally 
it is comparable to similar questions e.g. in Japanese, such as go-chuumon-wa? (HON-order-TOP) 
‘what is your order?’ (in a restaurant). Such questions are also possible in European languages, 
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fox is above (235.5, Figure 181), i.e. above the location they had so far been talking 
about. He realizes the initial topic, atuq-qa, with a separate rise-falling phrase. This 
is followed by a downstepped rise-falling phrase, realizing the focused constitu- 
ent hana-chaw-mi. The following material, na-chaw ka-yka-n, is realised in a single 
downstepped falling phrase aiding to cue it as backgrounded. Even though KP04’s 
question is very broad, TP03 seems to have interpreted it to convey a QUD like 
WHERE IS THE FOX? and not e.g. WHAT DOES THE FOX WANT FOR DINNER?, since in the 
Maptask, only the first question is relevant for the overall discourse goal (Roberts 
2012a) of following the correct path to the end. A possible explanation for why TP03 
realizes atuq-qa in initial position here, instead of entirely omitting it (permissible 
since it is part of the question) or realizing it in final position, as done with manka 
earlier in 219.5, involves selection between active referents. Since the directly 
preceding discourse had been about the relative locations of pot, bat, and lightning, 
it is cooperative to repeat the somewhat suddenly introduced referent "fox" at the 
beginning here to avoid confusion. 


T kl ^ 
oM : T UM 


Figure 181: TP03 MT Q 2355.7? Cf. (187) for the context. 


It is also likely that for similar reasons, TP03 understands KP04’s question to imply 
that the fox should be at the same location as those three, whereas for TP03, it is 
in a separate location. Realizing atuq-qa in initial position and downstepping the 


like “and your bike?", “y el zorro?", or “und die Wasserwage?”. In our Quechua data, unlike neutral 
polar and wh-questions, these types of question are probably most often realized with a rise. Cf. 
also note 48 in section 3.6.1. 

426 https://osf.io/7g3vk/ 
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phrase with the focused constituent might be a way of indicating the presence of 
this perceived expectational contrast in the context.*?? 

KP04’s response at 233.5 provides further evidence that TP03’s utterance in 
Figure 181 implies a perceived lack of relevance of KP04’s question about the fox 
(234.1) to the previous discourse. It is an assertion that can be understood as justifi- 
cation for bringing up the fox at all: atuq-chaw-mi usha-n naani “it is at the fox that 
the path ends" (238.4, Figure 182). This utterance is remarkable considering the 
context: from the preceding discourse the fox is active as topic, it was marked so in 
both KP04’s question (234.1) and TP03’s response to it (235.5). Applying the criteria 
developed by Riester and colleagues, the implicit QUD here has to be WHAT HAPPENS 
AT THE FOX?, so that the fox is backgrounded in the corresponding assertion. An 
utterance conveying the proposition *the path ends at the fox" at this point would 
therefore be expected to be structured in such a way as to maintain the fox as topic, 
and the predicate *the path ends" as the at-issue content. From what we know, 
such an utterance might for instance be naani ushan atuqchawqa, with atuqchawqa 
downstepped; or atuqchawqa ushan naani, with it realized as a rise, and ushan 
naani in a falling phrase signaling its status as at-issue content; or just ushan naani, 
with the fox omitted as topical subject. Yet none of this is the case. 


Figure 182: KP04 MT Q 2384.^* Cf. (187) for the context. 


427 It is tempting to also attribute the substantial rise on the lengthened initial syllable of the 
focused constituent hana-chaw-mi to this contrast, in the manner of iconic prosody like “it's faaar". 
I can only note this auditory impression by a listener whose hearing conventions are shaped by 
European languages. Should it turn out to be a real phenomenon in future research, it would be 
important to investigate what anchors the position of the rise. 

428 https://osf.io/xn256/ 
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The constituent denoting the fox, atuqchawmi, is realized initially, -mi-marked with 
a rise-falling phrase, cueing it to be focused as answer to the QUD. Ushan naani is 
realized with two downstepped falling phrases, i.e. in a position and with prosody 
so far seen as cueing backgrounded information. Thus the information structure 
inferred and expected from the context and that signaled by the utterance itself are 
in conflict here. This is summarized in (188). 


(188) Comparison of information structure of KP04 MT Q 2384 as inferred from 
context and as likely cued by morphosyntax and prosody 


Inferred from context: [ not at-issue; topicl [ at-issue; comment] 
atuq-chaw-mi ushan naani 

Signaled by scaling relations: (x ) 
(x ) x ) 


Signaled by contour shape: [ closed] [ closed] [closed] 
Cued by morphology: [ focus] [ background] 
Cued by word order relative to verb: [ [ ] 

[ ] 


focus] background. 


comment] [ topic 


From (188), it seems that formal marking suggests a QUD like WHERE DOES THE 
PATH END?. However, if that were really the implicit QUD, the utterance should be 
expected to be simply atuqchawmi, which would not convey any new information 
in this context. As in previous examples and according to the proposed template, the 
fact that constituent(s) that could be omitted are instead realized as downstepped 
in the postfocal final position seems to indicate that these elements are needed to 
relate the proposition conveyed by the utterance it to its context, or that it should 
be understood as answering QUD that is different from the one inferrable from the 
context. So, if formal marking here aligns to signal a QUD like WHERE DOES THE PATH 
END?, in spite of this going against the preceding context progression, this gives rise 
to an implicature that the utterance is relevant with this information structure, 
even though it does not seem to be so at first sight. I argue that the missing puzzle 
piece for understanding this lies in the fact that the information where the path 
ends is necessary for the completion of the game. Without knowing this informa- 
tion, the overall discourse goal cannot be achieved. Thus, WHERE DOES THE PATH 
END? is a relevant QUD because responses to it constitute a partial answer to the 
superordinate QUD of the game, WHAT IS THE CORRECT PATH?. Because answering 
this question is thus by definition relevant for the game, answering it with “at the 
fox" in turn gives an explanation for why KP04 had brought the fox up at 234.1, 
and which, as argued, TP03 had implicitly questioned the relevance of at 235.5. It 
is because the utterance, in utilizing both prosodic and morphosyntactic means 
against the expectations built up by the context, manages to answer both WHAT 
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ABOUT THE FOX? and WHERE DOES THE PATH END?, that I have translated it here with 
an it-cleft, “it is at the fox that the path ends". In my understanding, such a structure 
can also be used to express such a complex information structural configuration, 
in which the fox is both contrastively specified as answer to WHERE DOES THE PATH 
END? and the topic about which the proposition is predicated. 

This section has provided evidence that utterances which from the context can 
be inferred to have a more complex internal information structure than the broad 
focus observed in the previous section in Cuento do reflect this increased complex- 
ity in their prosodic realization, although no simple one-to-one correspondence 
between prosodic marking and information structural categories can be assumed. 
I proposed a heuristic tripartite positional template that prosodic cues as well as 
cues from other domains can refer to, in which material realized to the left of a 
first relevant prosodic boundary and with a rising contour is linked to a topical 
interpretation, as is material to the right of a second relevant prosodic boundary. 
Materialin between these two boundaries, realized with a falling contour, is linked 
to an at-issue interpretation. A reduction in tonal scaling between two adjacent 
phrases (downstep) has been taken to cue that the second phrase is less promi- 
nent than the first one, in opposition to the increase in tonal scaling between units 
we saw in the previous in Cuento, cueing rightmost prominence compatible with 
broad focus. In these data from Maptask, we also continued to observe how pro- 
sodic and morphosyntactic cues interact in the signaling of information structure. 
Mostly, they could be seen to plausibilize an interpretation that is also inferrable 
from the context. In the final example we also saw how they aligned to contra- 
dict the interpretation from the preceding context. It was argued that plausible 
reasons for this can be found when more than the immediate discourse context is 
considered. In the final section, utterances with conflicting cues will be examined. 


6.4.3 Variability and conflicting cues in Conc 


This section will explore further how cues to information structure from prosody, 
morphology, and word order can interact using data from Conc, mainly by ZR29 & 
HA30, but also in comparison with other speakers. I will begin by recapitulating 
what we know so far about how word order is related to signaling information 
structure. In the Maptask examples, it was often the case that the constituent 
directly in preverbal position was (narrowly) focused, which would align with 
observations on a number of verb-final languages (cf. Kügler & Calhoun 2020: 
465). For southern Quechua II varieties, Muysken (1995: 383); Sánchez (2010: 
36-37) assert that postverbal elements cannot be focused or marked by one of the 
evidentials. Sánchez (2010: 12-13, 37, note 10) further claims that the canonical 
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word order is verb-final, and that this is linked to broad focus. According to her, 
other word orders signal a different information structure. In Cuento, utterances 
with the *non-canonical" order V-O have also been seen to be clearly interpretable 
as having broad focus from both context and prosody (cf. Figure 174 and (183)). 
For Huallaga Quechua, which as a variety of Quechua I is more closely related to 
Huari Quechua, Weber (1989: 15-16) points out that even though it shares many 
properties typically claimed for SOV-languages (postpositions instead of prepo- 
sitions, modifiers before their heads, main verbs before auxiliaries, possessors 
before possessed, cf. Greenberg 1966), main constituent word order is actually not 
so much one of them: in his count, only slightly less than half of all sentences real- 
izing all three constituents (48 of 99 out of a total of 1309 observed sentences, 714 
of which transitive) actually did so in SOV order. He does however also connect 
the preverbal material to a notion of focus (Weber 1989: 419, 428—431). All three 
accounts agree that word order is sensitive to information structure, and that 
there is at least a preference for focal elements to be realized preverbally. In our 
data, an evidential-marked constituent does not occur postverbally in all seven 
Conc, Maptask, and Cuento corpora.*?? However, there is evidence that a postver- 
bal constituent can occur in contexts in which it is the answer to the QUD, i.e. 
focused, and also contrasted, as in some of the following examples. 


6.4.3.1 Cues to information structure in five examples from ZR29 & HA30's Conc 
I will first introduce examples from Conc and then expand the discussion to cues 
conflicting with each other and with IS as constructed from the context of the 
respective examples. To my knowledge, while multiple cues also conflicting with 
each other have e.g. been discussed in Baumann & Grice (2006); Venditti et al. 
(2008); Kaiser (2016); Edeleva et al. (2020), among others, this is the first attempt for 
Quechua to show that there are cases in wich cues conflict both with each other and 
with expectations from the discourse context. The examples are HA30 Conc Q 1482 
((189)/Figure 183) and 1576 ((190)/Figure 184), LD20 Conc Q 1150 ((191)/Figure 
185), and HA30 Conc Q 1836 ((192)/Figure 186). 


(189) HA30 Conc Q 1482 
na washa chawpi-chaw  ka-yka-n este  tsuku 
PSSP DEM.DIST middle-LOC — COP-PROG-3 uhm hat 
“in the middle of that over there there is the hat" 


429 Muntendam (2010: 117, 126) reports —mi-marked postverbal constituents for an unidentified 
variety of Quechua not studied by herself, but does not find them in her own data from Ecuadorian 
and Bolivian varieties. 
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(190) HA30 Conc Q 1576 
hana  kantu-chaw ka-yka-n arash 
above edge-LOC COP-PROG-3 lizard 
*at the border above there is the lizard" 


(191) ZR29 Conc Q 1150 
kay ultimu-chaw-mi ka-n pitu 
DEM.PROX last-LOC-ASS COP-3 whistle 
“in the last one here there is the whistle” 


(192) HA30_Conc_Q_1836 
arash ka-yka-n tsay chawpi-chaw 
lizard COP-PROG-3 DEM.PROX middle-LOC 
“the lizard is in the middle there” 


Figure 183: HA30_Conc_Q_1482.**° Cf. (195) for the context. 


All these examples have non-verb-final word order: a form of the copula is sur- 
rounded on one side by a noun specifying the location at which an item in the game 
is to be found, and on the other by another noun specifying the item. As explained 
in section 6.2.2, speaker pairs mostly follow one of two strategies in Conc consist- 
ently, either to realize the location-specifying expression before the item-specifying 
one (pattern a), or the other way around (pattern b). Correspondingly, in three of 
the four examples they first specify locations and then the items to be found at 


430 https://osf.io/4ahqe/ 
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them. HA30 Conc Q 1836 ((192)/Figure 186) is the exception which reverses the 
order of the nouns. Section 6.1.1 had an example from this Conc (repeated here as 
(193)) with verb-final word order: location — game item — copula. 


roam 


above ede LOC COP-PROG-3 lizard 


F 
on the border above there vs the lizard | 


Figure 184: HA30 Conc Q 1576.'' Cf. (195) for the context. 


hana kante-chew ka-vkaa-0 arash | 


roan) 


aknmahanm 


- — = = 


| kay ujtimu-chare m ka-a qun 
ALL Á—— E - — t —— 


| DEM PROX last-LOC-ASS COP-3 wwlersthe 
T 


| im the last one bere there i+ | while 


Figure 185: ZR29 Conc Q 1150.^? 


431 https://osf.io/w5kzy/ 
432 https://osf.io/bhqxy/ 
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(193) HA30 Conc Q 0211 (repeated from (108)/Figure 125, see also (125)) 
washa kantu-chaw  ukush-mi ^ ka-ykaa-n 
DEM.DIST edge-LOC mouse-ASS COP-PROG-3 
*at that border there is a mouse" 


heard COP-PROG-3 DEM DIST mdle-LOC 


the lizaed is tn the cde there 


Figure 186: HA30 Conc Q 1836.9? 


In this example, the location is realized with a rise, and then the -mi-marked 
item-specifying expression plus the locative copula follow in a falling phrase. Word 
order, morphosyntactic marking and prosody here all act just as the context lets us 
expect: a location-specifying referent is called up as a topic, and then a predication 
is made about that location regarding which item is to be found there in a way 
that also conforms to the predictions made about word order and —mi-marking 
on other Quechua varieties (see above). In the four other examples though, things 
are different. Word order here is not verbfinal as in HA30 Conc Q 0211. From the 
predictions about the relation between word order and information structure by 
Muysken (1995); Sánchez (2010), and Weber (1989), as well as partially from the 
analysis of the Maptask examples, the preverbal constituent (location-specifying in 
(189)/Figure 183, (190)/Figure 184, (191)/Figure 185, item-specifying in (192)) would 
be expected to be focal. It should be -mi-marked and realized with a falling phrase, 
so that it is clearly cued as the at-issue component of the assertion. The postverbal 
constituent would be expected to be downstepped, cued to be interpreted as post- 


433 https://osf.io/28nq9/ 
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focal/non-at-issue. At most it might be a resumptive topic, but not denote a newly 
introduced referent whose specification is a crucial part of what is asserted. All of 
these assumptions do not hold. 

Instead, the initial constituent is only once realized with —mi (in (191)/Figure 
185), and also realized with a rise instead of a fall in (190) and (192) (probably also 
in (189)/Figure 183, but there is interference from consonants). The postverbal con- 
stituent is realized as downstepped only in (190) and (192). In (191), it is realized 
with a rise, and in (189)/Figure 183 with a rise-fall whose high tonal target is just 
as high as preceding ones, while it is supposedly backgrounded in all of them. The 
copula however is downstepped or low in all five utterances. 

An analysis of their contexts does not fully mirror the divisions made by these 
formal cues. Instead, context can best be said to group (189)/Figure 183, (191), and 
(193)/Figure 125 together, with (190) and (192) separate. In particular, (189) and (191) 
are very similar contextually. In both of them, the item-specifying expression (tsuku 
“hat”, and pitu “flute”, respectively) denotes an item mentioned for the first time in 
the game. They are first turns after a speaker change, and the location-specifying 
expression also denotes a new referent. In (193), the location-specifying expression 
washa kantuchaw is repeated from the directly preceding turn at 10.3 consisting 
only of that phrase (cf. (194)). The item-specifying expression ukush-mi “mouse” 
is lexically given, because HA30 begins by making an enumeration of all the items 
that are there. 


(194) Beginning of ZR29 & HA30 Conc Q?* 


time HA30 
(seconds) 
0.8 (L H Le ) 


tsay-chaw-mi ka-n 
DEM.DIST-LOC-ASS COP-3 
there is 
2.5 (L H) 
ukush 
mouse 
4.2 (H L) 
tsuklla 
hut 


434 https://osf.io/z6akq/ 
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5.2 (H L) 
uusha 
sheep 


[moderator intervenes] 


10.3 (H-— L-------------- ) 
washa kantu-chaw 
DEM.DIST edge-LOC 
at that edge there 


The game moderator intervenes after the first three items, explaining how the 
game is played, and HA30 continues with the turn at 10.3, leading to the one at 
21.1, which is (193). Thus, ukush has been mentioned previously in (193), but it is 
not unambiguously the most salient referent in the discourse. From the context, it 
makes no sense to interpret the item-specifying constituents as backgrounded in 
either of (189)/Figure 183, (191), or (193)/Figure 125. The opposite is the case: speci- 
fying them means selecting from a limited set of alternatives in all cases (implying 
a degree of contrast), i.e. all the other available items in the game (twelve in total), 
or in the case of (193), perhaps from the set of the three mentioned initially. This 
selection is new information in all three. In (189) and (191), the item-specifying 
constituent itself is totally new. So it is clearly at-issue in all three examples. Yet it 
is realized postverbally with N-V-N order in (189) and (191), and preverbally with 
N-N-V order in (92). If any contextual differences are to be invoked to explain this 
formal variation, they must be the subtle ones discussed here: the possible differ- 
ences in the degree of activation of the item referent (ukush being more active in 
(92) than tsuku and pitu in (189) and (191)), that ukush is perhaps selected from a 
smaller set of referents, and the fact that (92) is directly preceded by a turn that 
already realizes the location once. But the difference cannot be one in terms of 
the overall at-issue partition of the utterances, since the item-specifying expression 
is in focus in all three. Note that the postverbal item-specifying expression is not 
downstepped in (189) and (191). 

The situation is different again in (190). This is the fourth attempt in this Conc 
to correctly guess the location of the lizard (arash). In a previous turn, ZR29’s at 
135.2, its location had last been wrongly guessed (cf. (195)), so that the utterance is 
perhaps even a kind of reversal/correction of that move. 
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(195) ZR29 & HA30 Conc Q 1352-1591? (context for (189)/Figure 183 and (190)/ 


Figure 184) 

time HA30 ZR29 

(seconds) 

135.2 (L H--) (HL-------------------- (L H) 
kay pitu ladu-n-chaw-na-mi arash 
DEM.PROX flute side-3-LOC-DISCONT-ASS lizard 
beside that flute now there is the lizard 

137.9 [short discussion between moderator and ZR29] 

148.2 (L H---------------------------- "H L )(L H D 


na washa chawpi-chaw ka-yka-n este tsuku 
PSSP DEM.DIST middle-LOC COP-PROG-3 uhm hat 
there in the middle is the hat 
151.7 (H--------— L--) 
chawpi-chaw 
middle-LOC 
in the middle 
156.4 (L H---) 
y hana 
and above 
157.6 (LH--——— )I(H_L----------- ) 
hana kantu-chaw ka-yka-n arash 
above edge-LOC COP-PROG-3 lizard 
at the edge above there is the lizard 


Thus, arash is here both referentially and lexically given, and determining its loca- 
tion is a salient issue in the discourse. This makes it more likely that the utterance 
has an implicit QUD like WHERE IS THE LIZARD?, as opposed to the previous exam- 
ples, which answer a QUD like AT [LOCATION], WHAT IS THERE?. This view is sup- 
ported by the fact that kaykan arash is very clearly downstepped and arash possi- 
bly even part of the phrase-final low,*** cueing it as backgrounded. However, the 
preverbal location-specifying constituent is neither -mi-marked nor realized with 
a falling contour, which goes against what we might expect with the QUD WHERE IS 


435 https://osf.io/3r5x8/ 

436 Here it is plausible to see arash as not contrastive in the sense of not having any salient alter- 
natives, because it is part of the background of the QUD. This shows again that word order does 
not suffice as cue, because we just saw that postverbal elements can be focal and contrastive (pace 
Sánchez 2010: 51). Arguably it needs the combination of being postverbal and scaled down to cue 
backgrounding (and a realization in final position in the proposed template) here. 
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THE LIZARD?. The case for an implicit QUD in which the item is given instead of the 
location is even stronger for (192). The move by another speaker directly preced- 
ing it in the game, by ZR29 at 169.7, is mas ladunchaw arash “further to its side is 
the lizard”, so that (192) can be seen as a kind of retort,**’ asserting that the item 
is at a different location. This interpretation is supported by the fact that (91) is 
the only utterance in the game by these two speakers in which the item-specifying 
expression arash is initial and preverbal, and the location-specifying expression 
is postverbal. Furthermore, it is also realized with a rise, supporting its status 
as a topic. Here, word order and prosody thus seem to align in cueing the item 
as topical. However, both the verb and the postverbal location-specifying expres- 
sion are also clearly downstepped and thus not prominent despite being at-issue 
from the context. In both (190) and (91), the cues that signal the referent contex- 
tually determined to be non-at-issue as such seem stronger than any that cue the 
reverse, i.e. signaling the other referent as at-issue. In all of these examples, with 
the exception of (193)/Figure 125, we see conflicting cues. (196) summarizes these 
cues to IS from context, word order, morphological marking, and prosody in the 
five utterances. 


(196) Conflicting cues to information structure from context, word order, morphol- 
ogy, and prosody in five utterances from Conc by ZR29 and HA30 


IS via context [ topic [ focus] 
HA30 Conc Q 1482 na washa kaykan este tsuku 
(=(189)) chawpichaw 
order of elements location- verb item-specifying 
specifying expr. expr. 
IS via word order 
relative to verb [ preverbal /focus] [ — postverbal/background] 
overall [ topic] [ focus] 
IS via morphology -- E - 
IS via prosody 
Contour shape [ risefopen?] [ falycloseal [ rise-fall/closed] 
Sc aling [ low/downstepped] 

Metrical structure ( X ) ( X ) 
IS via context [ focus] [ topic] 
HA30 Conc Q 1576 hana kantuchaw kaykan arash 
(=(190)) 


437 It is not a direct response to ZR29’s move because that was followed by the game moderator 
turning the card over and showing that her guess was wrong. 
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order of elements location- verb item-specifying expr. 
specifying expr. 
IS via word order 
relative to verb | preverbal / focus] [ — postverbal/hackgrounal 
overall [ topic] [ focus] 
IS via morphology - -- - 
IS via prosody 
Contour shape [ riseopen] [ fal/closed] Í falliclosed] 
Scaling [  tow/downsteppeal [Í low/downstepped] 
Metrical structure ( X ) 
( X ) ( X ) 
IS via context [ topic] [ focus] 
LD20_Conc_Q 1150 kay kan pitu 
(= (191)) ultimuchawmi 
order of elements location- verb item-specifying expr. 
specifying expr. 
IS via word order 
relative to verb [ preverbal / focus] [ — postverbal/backgrounal 
overall [ topic] [ focus] 
IS via morphology [ -mi-marked/focus] 
IS via prosody 
Contour shape [ rise-fall/closea] [ rise/open] 
Scaling [.— low/downsteppeal 
Metrical structure ( X ) ( X ) 
IS via context [ topic [ iocis] 
HA30 Conc Q 1836 arash kaykan tsay chawpichaw 
(-(192)) 
order ofelements _item-specifying verb location-spec. expr. 
expr. 
IS via word order 
relative to verb [ preverbal / focus] [  postverbal/hackground] 
overall [ topic] [ focus] 
IS via morphology  — -- - 
IS via prosody 
Contour shape [ rise/open] [ fall/closedl 
Scaling [ low/downsteppedl 


Metrical structure ( 


) 
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IS via context [ topic [ focus] 
HA30 Conc_Q_0211 washa ukushmi kaykan 
(= (193)/Figure 125) kantuchaw 
order of elements location- item-specifying verb 
specifying expr. expr. 
IS via word order 
relative to verb [ preverbal / focus] 
overall [ topic] [ focus] 
ISvia morphology [Í -mi-marked/focus] 
IS via prosody 
Contour shape [ rise/open] [ faliicloseal 
Scaling [ low/downsteppedl 
Metrical structure ( X )( x ) 


Word order is given as cueing IS in two different ways in (196). As argued in 
section 6.2.2, the game context likely induces an overall bias to treat the loca- 
tions as more active/accessible in the discourse. That makes them more likely to 
be topics about which predications are then made in which the items are focal. 
This is what plays out in their overall word order, in which the location-specifying 
expression precedes the item-specifying expression in four out of the five cases 
here, with (192) being the only exception to that in the entire game. As argued, 
(192) follows the order topic-comment, but with the item exceptionally being the 
topic and initial. For (190)/Figure 184, I also argue that the item should be inter- 
preted as more topical or backgrounded than in the other three examples, but 
here, word order does not reflect that, instead being the same as in (189) and (191), 
so that the postverbal item is once backgrounded and twice focal. On the other 
hand, word order relative to the copular verb*?* cues preverbal focus and post- 
verbal background, according to the predictions made for other varieties and to 
what we have seen in the previous section from the Maptask examples. These two 
different ways in which word order can be seen to cue IS are given as *overall" and 
*relative to verb", respectively. This means that in all examples except (193)/Figure 
125, overall word order between location-specifying expression and item-specify- 
ing expression, and word order relative to the verb, are in conflict with regards 
to their cues to information structure. Overall word order aligns with context in 
almost all cases, except (190)/Figure 184. Verb-medial utterances with only one 
constituent to the left and the right of the verb, respectively,*?? are an inherently 


438 Most other utterances from the Conc data by all speakers are verbless. 
439 This is the preferred realization by HA30 in this corpus, see Table 49 below. 
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ambiguous case: according to the position template proposed in the previous 
section, in such cases, the initial constituent will tend towards a topical interpreta- 
tion, but it will also be likely to receive a focal interpretation because it is directly 
preverbal.*^? In terms of prosody, three different cues are also distinguished. In 
(190)/Figure 184, (192), and (193)/Figure 125, the rise on the initial element, cueing 
openness, aligns with the topical interpretation from context and overall word 
order. In both (193) and (189), the non-downstepped rise-fall on the item-specifying 
constituent cues completion, supporting an interpretation as focal comment, but 
its realization with a rising phrase in (191)/Figure 185 does so less. However, that 
the postverbal item-specifying constituent is not downstepped in (189) and (90) 
but in (89) supports the contextual interpretation that it is focal in the two former, 
but backgrounded in the latter. In both (192) and (89), that the item-specifying con- 
stituent is realized as downstepped aligns with the contextual interpretation of it 
being backgrounded and non-contrastive. None of the utterances unambiguously 
cue final main prominence. The downstepped verb in all utterances suggests that 
the preverbal element is at least locally more prominent. The preverbal element 
is also globally most prominent in the verb-medial utterances (190) and (192), 
because the postverbal element is also downstepped. However, in (189) and (191), 
the downstepped verb is surrounded by equally scaled phrases. The phrases real- 
izing the location- and item-specifying expressions are thus relatively balanced in 
prominence cues, with a rightmost default acting in favour of the final element. 
In (191), the realization as a falling phrase and the marking with —mi on the loca- 
tion-specifying constituent support its interpretation as at-issue, but this clashes 
with the rising realization of the item-specifying constituent, which makes it diffi- 
cult to interpret it as backgrounded.*** Overall however, scaling-based cues seem 
to align more clearly with the contextual interpretation than contour-based ones, 
and word order cues seem to also need scaling cues in order to be interpretable for 
an IS-partition into at-issue- and non-at-issue-material. 


440 This analysis is also applied to KP04 MT Q 2384 ((188)/Figure 182) in the previous section. 
According to Sánchez (2010), left-peripheral elements can also be either focal or topicalin southern 
varieties of Quechua. The conundrum is rather old: with two proposed principles of pregrammat- 
icalized linearization, the fronting of unexpected information and that of important information, 
Givón (2018 [1979]: 180, 217, 221) plausibilizes the initial realization of both new topics (“L-dislo- 
cation"), and of foci (in clefts). 

441 Several other utterances by ZR29 in Conc realize the final item-specifying expression as ris- 
ing. I suggested in section 6.2.2 that this might be part of a stance of uncertainty projected by this 
Speaker. 
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6.4.3.2 Comparison with further examples 

The findings summarised in (196) provide further evidence that prosody, word 
order and morphosyntactic marking can cue information structure both in conflict 
with each other and against what is expected from context via QUD analysis. This 
suggests that they do not cue it directly, and instead do so indirectly, and/or probabil- 
istically. For prosody, this insight is represented in (196), where I take contour shape 
(rising or falling) and tonal scaling (realization as low or downstepped) to cue open- 
ness/closedness (see section 6.2.2) and phrasal prominence, respectively. We have 
seen in section 6.2.3.3 as well as here that both information status (given-new) and 
information structure (focus-background, topic-comment) affect prominence, but 
that prosody does not cue these categories directly. For word order and morpholog- 
ical marking, (196) represents them as cueing information structure directly, but 
this is only out of convenience until it is better known what exactly they contribute 
to the meaning of an utterance.*^? We saw that word order alone is not a good cue 
for an IS-interpretation of constituents. In addition, while we can identify domi- 
nant strategies where speakers pick out one of the elements (location or item) to be 
specified first in order to anchor the predication involving the second of them to it 
(cf. section 6.2.2), suggesting a kind of topic-comment structure, the nature of what 
has been called a topic here seems different from that in Maptask and also Cuento. 
In those tasks, a topic is often continued across several utterances. In Conc, on the 
other hand, the topic constituent (most often the location) is selected from an avail- 
able and finite set anew (almost) each time, arguably creating a contrast relative to 
the other members of this set, and it therefore needs to be just as explicitly speci- 
fied as the element which is then put into relation with it.**? In this sense, the two 
nominal elements here are more similar to each other than was the case for what 
was discussed as topics and foci in Maptask and Cuento, which might go some way 
in explaining the conflicting cues from prosody and morphosyntax.**^^ Note that 


442 The paradigmatic meaning of the evidentials is of course subject of a number of works. I refer 
here to their syntagmatic meaning, i.e. what is cued by their placement at some position in an 
utterance, and not another. 

443 The beginning of the game (194) shows that also in Conc, a topical location can be omitted in 
subsequent turns once activated, as HA30 does with tsaychawmi ‘in there’. 

444 I think that introducing some of the terminology specific to copular sentences, like predica- 
tional, specificational, identificational (cf. Dikken 2006: 298-300) does not help the issue at this 
point. On the one hand, the different types of copular sentences are supposed to differ regarding 
their IS: in specificational sentences, the pre-copular NP cannot be in focus, while this is not the 
case for predicational sentences ( Dikken 2006: 301; Martinović 2013: 137). However, the location- 
and the item-specifying expressions in Conc are arguably both referential, making these copular 
sentences identificational (Martinović 2013: 139) rather than one of the two other types, which 
does not help for predicting their information structure. Dikken (2006: 300) also points out that 
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the “topic marker” —qa does not occur once in all of the seven Conc games analysed 
here. Whatever topic-related meaning -qa signals, there seems to be no need for it 
in these discourses (it does occur in Maptask also in copular sentences, e.g. Figure 
181). It seems that the two speakers here can choose varying expressional strate- 
gies to express these contextual conditions. Recall again from section 6.2.2 that the 
majority of speaker pairs, including ZR29 & HA30, follow the order location - item 
(pattern a) in Conc, while one speaker pair out of seven (XU31 & OA32) exclusively 
prefer the reverse order (pattern b). These are thus two diverging default strategies 
of ordering the elements, perhaps reflecting two different information structural 
conceptualizations, with one treating the location as more topical, and the item 
as more focal, and vice versa. Here we saw that the one time the less frequent 
order (b) occurs in this Conc ((192)), it cooccurs with an information structure that 
is also other than expected, and probably aids in cueing this. This expectation is 
shaped by a global distribution, i.e. that the location — item pattern is more fre- 
quent across all speakers, but also by the local expectation established in the game 
between these two speakers. In XU31 & OA32's Conc, the same order of elements 
cannot be marked in the same way, because it is the exceptionless default in their 
game (even though it is globally marked across all Conc games). This suggests that 
locally and temporally established conventions between interlocutors are relevant 
for the analysis of meaning in natural conversation. While ZR29 & HA30 are with 
the majority in terms of order of elements, suggesting a broadly topical interpre- 
tation for the location-specifying expression, prosody in terms of contour shape 
paints a different picture. As seen from Table 29, five of the seven speaker pairs 
realize a clear majority of the first elements (the item-specifying expression for 
XU31 & 0A32, the location-specifying expression for all others) with a finally rising 
contour, and the second element, correspondingly, with a finally falling contour. 
They thus align the information structural cues from overall word order with those 
from contour shape. ZR29 & HA30 are the only ones who clearly go against this 
trend. Table 49 summarizes all of their moves from Conc, sorting them according to 


copular sentences can be ambiguous between the three readings *on paper", which seems to mean 
‘without context and intonation’, while Martinović (2013: 149) additionally claims that her analysis 
on data from Wolof puts the claimed information-structural restrictions on specificational sen- 
tences, based as usual mostly on European languages, in doubt. I suggest that at this point it is still 
too early to say anything about what types of sentences the ones here from Huari Quechua are, in 
which definiteness is not marked and in which word order in copular sentences is variable, as we 
have seen. However, I would hope that the analysis presented here, aiming to relate formal mark- 
ing to contexts, can be used to aid such further in-depth studies in the future. 
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order of elements, -mi-marking, and prosody.*^? Their prosodic behaviour against 
the overall trend shows there as well. 


Table 49: Moves in ZR29&HA30 Conc Q sorted according to the order of elements (location-specifying 
expression, item-specifying expression, verb), marking with -mi, final contour on each element, and 
whether they are downstepped / realized as low. 


order of £8 88 8 8 & EZ $ $ 8 F $ E EE 

I t 3 1 32 $9 3 3 3c 0003 o 3 33 

elements o s n 3 2s 7 BR F E a 2 = ss 
= = = = - X 
B = © + © 5 s C oe 
= 435 0 2 P0 
o o 4 E S EE 
R Zp geg a 
s 5 € $ g 

* 2 2 MS 

Location 4 - - - 4 = i cem = = - P" 

Location-verb 2- =- 1 1 = =. m 1 2 x 

Location-item 582 - 1 4 4 1 - = = - - = 

Location-item-verb 2 - 1 fy - 2 - IEEE - - 1 

Location-verb-item 13 6 - S f 4 8 3 10 - 2 10 

Item-verb-location m- - LE 1 - - = 1 1 - 1 

Item 4 - - - - 1 1 =- => P - 


All 31 8 1 8 19 10 13 1 5 13 1 2 13 27 25 18 


The three rightmost columns are concerned with tonal scaling. They count how 
often the three elements are realized either with a separate phrase, but down- 
stepped, or as only the low part of a contour. Both speakers here show a clear ten- 
dency to realize the verb as scaled low (13 of 18 occurrences). The location-specify- 
ing (1 of27) and the item-specifying expression (2 of 25) are rarely downscaled, and 
only in final position. In the high number of verbal forms (in 18 out of 31 moves), 
this Conc is also an exception: in the Conc corpora by the other six speaker pairs, 
only five verbal forms occur in total, of which only one is a copula used in the same 
way as here (the others are part of item-specifying expressions). The frequency of 
-mi (9 times in 31 moves) here is also exceptional compared to the Conc games by 
other speaker pairs: -mi only occurs five other times, four times in TP03 & KP04’s 
Conc, and once in that by XU31 & OA32. In ZR29 & HA30's game it mostly occurs 


445 The table also includes sequences that are not full guessing moves in the game because they do 
not specify both location and item. They are either specifications of only the item or reformulations 
after a guess (the “Item” row), or incomplete guessing moves (“Location” and *Location-verb") that 
are then followed by full. Table 29 did not include any of those, which is why total numbers differ 
between the two tables. 

446 It is mostly HA30 who uses verbal forms at all. 
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on an element realized with a falling contour but not always, as Figure 187 shows. 
There it occurs on a location-specifying expression realized with a rising contour. 

While the generalization that —m(i)-marked elements do not occur postverbally 
(Muysken 1995: 387) seems to hold here, utterances with —m(i) do not require the 
presence of a tensed verb in our data, as Table 49 and utterances like (197) demon- 
strate (pace Muysken 1995: 385; Sánchez 2010: 49). 


DEM PROX COP-AG middle-LOC-ASS. 


Vic one that is in dhe middle bere is 


Figure 187: HA30 Conc Q 0453"" (voicing threshold 0.9). 


(197) KP04 Conc Q 1653 
tsay keeda-q anka-m 
DEM.DIST stay-AG  eagle-ASS 
“the one that's left [is the] eagle" 


A question for future research is whether the three exceptional behaviours in 
ZR29 & HA30’s Conc (the two speakers are close friends) are related: the use of the 
copula, the use of —mi, and the preference for falling contours in initial position and 
rising contours in final position. Partly, they seem to constitute individual formal 
strategies for cueing information structures that are particularly sensitive to the 
similarity in status of the two elements denoted by the location-specifying and 
item-specifying expressions in Conc, comparable perhaps to what KP04 MT Q 2834 
achieves with the conflicting cues (188) as discussed in the previous section. It also 
seems plausible that these behaviours are partly the result of their individual and 
shared multilingual acquisition histories and stylistic stances. 


447 https://osf.io/6jfz4/ 
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6.4.4 Conclusions 


We have now seen how Quechua prosody and morphosyntax are used to structure 
utterances in more complex contexts than in those compatible with broad focus 
in Cuento. In the Maptask, word order, morphological marking and prosody were 
seen to mostly align in cueing information structural roles for constituents, relating 
them in a complex fashion to the context. Yet the Maptask and especially the Conc 
data showed that the cues can also not align, leading to ambiguous interpretational 
possibilities likely pointing to a more differentiated picture than just a separation 
between focus-background and topic-comment. Possibly, information structure in 
terms ofthese two dimensions is best thought of as an interpretational condensation, 
emerging from heterogeneous cues that contribute other information as well.*4? 
Regarding prosody, the observations here are difficult to reconcile with a view that 
aspects of it signal either of these dimensions directly: instead, I have argued that 
it makes sense to see prosody as signaling 1) whether units are together or apart at 
different levels of structure (via phrasing and scaling), 2) cueing the relative prom- 
inence at each of these levels (via scaling both within and across phrases) and 3) 
whether a unit should be understood as complete or incomplete (via tonal contour 
choice). None of this signals focus or topic, but assuming that competent speakers 
have expectations about how prosodic structures are built, they can exploit them to 
direct the interpretation of equally competent listeners towards intended informa- 
tion structures. Word order and morphology, which were treated more peripherally 
here, clearly also contribute to this complex interpretation. They are constrained 
by their own structures and also build up expectations based on them. Here they 
have been seen to not straightforwardly signal focus or topic, either. As amply seen, 
word order is only a good cue for information structure when combined with scal- 
ing-based cues. Regarding the evidentials, the prediction that only preverbal ele- 
ments can be marked with them seems to hold across varieties of Quechua, but that 
means neither that focus is always preverbal nor always marked by an evidential, 
nor even that only a constituent that could be marked by an evidential can be inter- 
preted as focal in context (postverbal elements are not -mi-marked but were shown 
to be focal). The constraint against postverbal evidential-marked elements seems 


448 Matić & Wedgwood 2013 express the more radical view that focus exists only as an interpreta- 
tional category for linguistic analysis and has no reality in actual language, especially crosslinguis- 
tically. I am thinking more along the lines of what Gunlogson (2001, 2008) has shown for declarative 
questions in English, in which the cues for polar questionhood from word order (subject-verb in- 
version) and prosody (final rise) productively misalign for the expression of a speech act category 
that has more marked contextual specifications than either assertions or neutral polar questions 
(cf. also section 5.1.2.2). 
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purely morphosyntactic, not one of information structure. Weber (1989: 427)’s state- 
ment that the evidentials do not always mark the focused constituent thus seems to 
hold, even apart from their likely specialized occurrence in certain types of confir- 
mation-seeking questions reported by Floyd (1996, 1997). Here we've only discussed 
the syntagmatic function of the evidentials, mostly disregarding their semantics. 
Studies of them (Faller 2002, 2003, 2011, 2014; Behrens 2012; Bendezü Araujo 2021) 
show that they likely interact with focus, but that they do not signal it directly as 
part of their meaning. 

It seems that word order and morphology, just like prosody, make contribu- 
tions cueing constituents towards an interpretation as focal or topical, but this is 
not a categorical function, and accompanied by other effects they have. Drawing 
this conclusion is only possible because of the adopted context-based view of infor- 
mation structure proposed by Riester and colleagues (cf. section 3.7.2), allowing for 
a strict separation of information structure from formal means of expression. The 
analysis of the data here has shown that occasionally, the formal means suggest 
a different information structure than that arrived at via implicit QUD analysis 
from the context. This allows for situations in which it is a speaker's intention to 
actively influence their interlocutor's interpretation of an utterance against what 
could be expected from the discourse context. The fact that such situations exist 
consequently means that the model by Riester and colleagues cannot uncover all 
subtleties of information structure in all contexts. Yet coming to this conclusion is 
only possible because a context-based approach allows us to first establish default 
correspondences between contexts and means of expression empirically, so that 
marked transgressions against them can then be identified. This would be concep- 
tually impossible under an approach that treats information structural categories 
as formal markers themselves instead of context configurations, because congru- 
ence between context and the presence of such marking can then only ever be a 
coincidence. 

All of this leads to a picture in which prosody, word order and morphology 
can either align or disalign in their cueing of information structure. The latter 
probably does not mean that no interpretation is possible, but rather that it is 
more detailed than just imposing a uniform focus-background and topic-comment 
structure. In order to understand this better, it should be worthwhile distinguish- 
ing more accurately at which level of structure each type of expression system 
contributes cues that are then probabilistically weighted (cf. Calhoun 2010a, 2015) 
for an interpretation in terms of these two dimensions, but not only in those terms. 
Venditti et al. (2008: 505) make a similar point regarding different aspects of Jap- 
anese prosody: 
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[. . .] phonological relationships such as the contrast between an IP boundary and a mere 
AP boundary are grammatical abstractions that native speakers acquire in the course of 
extensive exposure to a rich variety of language-specific cues. [. . .] A frequently encoun- 
tered congruence of cues from multiple sources in the signal induces a stronger abstraction, 
which allows adult native speakers to produce conflicting cues when necessary to convey 
the intended information structure [. . .]. The native listener, conversely, can accommodate to 
such conflicting cues to recover both the intended focus pattern and the syntactic grouping 
that the prosodic phrasing also cues. [. . .] [T]he next generation of models of intonational pho- 
nology needs to do better justice to these complex interactions among discrete contrasts (e.g., 
between accented and unaccented words) and continuous variation (e.g., more versus less 
extreme degrees of pitch range expansion or compression at the edges of focus constituents) 
and the ways that native speakers and listeners take advantage of the statistical dependencies 
among different patterns. 


I would suggest that this view can be extended to the interaction between cues not 
only from prosody. In a bilingual community such as the one investigated here, the 
availability of resources from two languages affords speakers with even more pos- 
sibilities to heterogeneously signal complex structures. I have barely scratched the 
surface of this, but I am confident that future research will be able to uncover far 
more here, using more sophisticated kinds of modeling (e.g. Boersma et al. 2020) in 
order to paint a more realistic picture of how prosody helps structuring informa- 
tion in actual conversation in a multilingual setting. 


7 Synthesis 


This final chapter first summarizes the results on the prosody of Huari Quechua 
and Huari Spanish individually, and then presents some conclusions on how they 
can be brought together. I will begin by summarizing the findings on Huari Quechua 
and then move on to Huari Spanish. 


7.1 Huari Quechua result summary 
7.1.1 Word prosodic profile of Huari Quechua 


In answer to research question (34)d about the influence of prominent positions 
at the word level on pitch events in Huari Quechua, the analysis in this study has 
found Huari Quechua to be an example of a language that can be said to *care" 
very little about word stress in the sense of Hyman (2014). That is to say, there is 
evidence for stress, but it is quite subtle. Of the suprasegmental cues for stress, it 
is known from the literature that in this variety of Quechua duration is employed 
for lexical distinctions independent of stress, non-culminatively and non-obligato- 
rily. Pitch contours have been found to be assigned at the level of the phonological 
phrase (PhP) / accentual phrase (AP), as e.g. in Japanese, French, or Korean, with 
a rate of 1.27 lexical content words per such phrase in a sample of 457 words (cf. 
section 6.2.3.2). Phrase-final tonal movements show temporal alignment behav- 
iour that is more indicative of boundary tones than of pitch accents associated 
with word stress (cf. section 6.1.6). Duration does often seem to serve as *phonetic 
enhancement" of the syllable on which the phrase-final peak is realized, but this 
is independent of whether that syllable is the word penult or not. Consequently, in 
accordance with the theoretical framework laid out in section 3.5, the assumption 
is that the H tone at least sometimes associates with the syllable on which it is real- 
ized, but since this is not related to a metrically strong position, alignment rather 
than association constraints create the relevant differences between the variants in 
the OT-analysis. Observations of phrase-final devoicing patterns (cf. section 6.1.3) 
further support the conclusion that tonal events are due to boundary phenomena, 
since they show the same final tonal movement as fully voiced phrases, but *dis- 
placed" to the left, with the putatively stressed word penult position often realized 
low or entirely devoiced. 

However stress has also been shown not to be entirely disregarded: in a subset 
of the tonal alignment patterns identified (the word penult-variant), the position at 
which a tonal transition occurs (from L to H or vice versa) must make reference to the 


[6] Open Access. © 2024 the author(s), published by De Gruyter. | C9 EZITZTEW| This work is licensed under the 
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
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penult of a word, so that in these patterns, the word penult should be seen as prom- 
inent or stressed insofar as it serves as the anchoring point for tonal movement. No 
evidence for prominence ontheinitial syllable of words could be found. In one ofthe 
alignment configurations, initial syllables of a word were found to be high. However, 
this is better explained (and modeled in the OT analysis as such) via alignment of the 
H tone to the word boundary. This is a better analysis because falls are also attested 
at boundaries between words. By assuming alignment to a word boundary, these 
two phenomena can be explained via a single mechanism in the OT analysis. Fur- 
thermore, initial syllables were only ever found to be high as part of a high plateau 
extending across the entire word, never as the only high syllable, whereas the word 
penult was found to be the only high syllable in a subset of realizations. In sum, 
this suggests that the word penult is a possible and variable anchoring site for tonal 
transitions, which seems to bethe only way that regular word stress manifests in this 
variety of Quechua. 


7.1.2 Observed patterns of intonational variation 


A lot of intonational variation has been found in the data, much of it without 
accompanying identifiable differences in meaning. I summarize the main strands 
of variation that were identified. 


7.1.2.1 Three tonal patterns 

The Huari Quechua tonal inventory (cf. research question (34)a) consists of three 
observed PhP/AP-level tonal patterns: a rising (LH), a falling (HL), and a rising-fall- 
ing (LHL) pattern. In terms of paradigmatic choice for the signaling of pragmatic 
meanings (cf. research question (35)a), the distribution of these patterns is only 
marginally influenced by utterance type, with declaratives, wh-questions, and 
polar questions not mainly differentiated by them. Some evidence (section 6.2.1) 
indicates that within polar questions, contour choice of either falling or rising inter- 
acts with tag particles such as aw and suffixes such as —ku to differentiate neutral 
from various types of biased questions. 

The rising pattern was found to be linked with what can broadly labeled as 
“incompleteness” or “continuation” (section 6.2.2): this includes indication that 
a topic or turn is to be continued, that a coherent unit has not ended, or that no 
separate discourse commitment is made. The falling patterns (with no differentia- 
tion found between the two subpatterns) are related to “completeness” or “finality”: 
indication that a topic, turn, or other coherent unit is finished, or that a discourse 
commitment has been made. The relation between these meanings and the tonal 
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patterns was only found to be statistical, with context serving for specification and 
speakers? stylistic choices also playing a role. The rising pattern was found on a little 
more than half of all phrases in a subset of the data consisting only of nominal ele- 
ments (section 6.2.3.2). The rising-falling pattern makes up a little less than a third, 
and the only-falling pattern, the least frequent, the remainder. In all patterns, a ten- 
dency exists to create long stretches of pitch effectively at the same level, in particu- 
lar also longer high plateaus extending across several syllables, with pitch declining 
not or only very gradually. 


7.1.2.2 Alignment variants 

The three tonal patterns are all observed to follow several different alignment vari- 
ants, formalized in an OT-analysis. This analysis (section 6.3) showed that the align- 
ment variants can be differentiated mainly by three factors: 1) whether tones align 
only with the boundaries of prosodic units or additionally with a word stress posi- 
tion, 2) whether the H tone is allowed to align with more than one edge, 3) whether 
the word stress position in loan words from Spanish is ignored or not. 

In the word-boundary variant, tonal transitions (between L and H in the 
rising pattern, between H and L in the falling contour, and between both in the 
rising-falling pattern) occur at word or phrase boundaries, i.e. tones are aligned 
only with prosodic edges. This variant is further differentiated by whether the H 
tone is allowed to align with more than one edge or not, i.e. whether some of the 
constraints effecting its alignment to more than one edge dominate the constraints 
effecting the multiple alignment of the L tone. Contours are here represented sche- 
matically, with the arrows with dashed line representing multiple alignment of the 
tone the arrow originates from. Its head marks the position to which its alignment 
(and the pitch level it effects) is allowed to extend, based on the constraint ranking 
underlying each variant. 


Rising patterns (without multiple H alignment) 
(198) 


pinkullupa hananpita (pinkullu-pa hana-n-pita / flute-GEN above-3-ABL / 
*from above the flute") 


Rising patterns (with multiple H alignment) 
(199) 


pinkullupa hananpita 
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pinkullupa hananpita 


Rise-falling patterns (without multiple H alignment) 
(201) 


pinkullupa hananpita 


Rise-falling patterns (with multiple H alignment) 


(203) 
L-------------- ><---------- HL 
pinkullupa hananpita 
(204) 
LH----------- ><------------ L 
pinkullupa hananpita 
(205) 
L------------- >HL 
pinkullupa hananpita 


Only-falling patterns (with multiple H alignment) 


(206) 
H------------- ><------------ L 
pinkullupa hananpita 
(207) 
H--------------------------- >L 


pinkullupa hananpita 


If tones cluster at the right edge of a phrase, the falling patterns can create contours 
in which the phrase-final penult stands out because it is high, while the following 
(and preceding, in (201)) syllable is low, due to tonal crowding constraints. These 
cannot normally be differentiated from similar contours arising from the word 
penult-variant, in which the word penult is specified as the place at which the H 
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tone is aligned. However, cases were observed where phrase-final devoicing takes 
place, sometimes up to and including the penult of the final word. In those, the 
pattern was found to be displaced to the left (cf. (205)): this is only explicable with 
an edge-seeking account. The contour in (201) is is the most frequent rise-falling 
contour observed (cf. section 6.2.3.2). That is additional evidence that it arises from 
both the word boundary- and the word penult variant. 

In the word penult-variant, which constitutes the only instance of detectable 
word stress on native Quechua words in Huari Quechua, tonal transitions (between 
L and H in the rising pattern, between H and L in the falling pattern, and between 
both in the rising-falling pattern) occur at the edges of the penult of a word, which 
is taken to be the entirely regular word stress position, with the domain of stress 
assignment including all suffixes attached to a root. In this variant the H tone 
always aligns with multiple edges if possible. 


Rising patterns 
(208) 


pinkullupa hananpita 


Rise-falling patterns 


(210) 
L------------------------ >HL 
pinkullupa hananpita 
(211) 
L------->H L---------------- > 
pinkullupa hananpita 
Only-falling patterns 
(212) 


pinkullupa hananpita 


Not all contours allow to distinguish which variant they belong to. Contours (210) 
and (212) are identical in form to (201) and (207), respectively. In contrast, with 
(198) and (208), and (202) and (211), respectively, the difference is one in alignment 
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of the H tone by only one syllable — and so that either the penult or the final syllable 
is realized with a H tone. This suggests that the quantitative findings on alignment 
in section 6.1.6 can be partially explained by the word-boundary and the word-pe- 
nult variant occurring together in the sample. 

No explanation in terms of function or meaning for the two alignment vari- 
ants on native Quechua words was found (cf. section 6.1). They also do not seem 
to be preferentially used depending on the type of experiment. Regarding their 
relative frequency, it has to be noted that the word boundary-variant can in indi- 
vidually phrased words only be identified by the occurrence of the equivalents of 
(198), (200), and (205), and the word penult-variant only by that ofthe equivalent of 
(208). In such phrases, it is thus to a large degree impossible to say which contour 
is more frequent. In a sample of multi-word phrases (cf. section 6.2.3.2), the word 
penult-variant was overall found to be more frequent than the word boundary-var- 
iant, and considerable differences were found between speaker pairs in separate 
experiments, ranging from 096 to 74% occurrence of the word boundary-variant in 
multi-word phrases. Thus it seems that use of the variants is at least partially due 
to individual preferences. 

The marked variants on Spanish loanwords are distinguished by tonal align- 
ment oriented according to the position of the syllable bearing word stress in 
Spanish, instead of alignment to word or phrase boundaries, or the entirely regular 
word stress position of the penult. Two patterns are distinguished: the *inherited" 
pattern, and the “grafted” pattern. The “inherited” pattern functions like the word 
penult-variant, except that the position to which the H tone is aligned is determined 
by the syllable which bears word stress in Spanish (213). In the *grafted" pattern, 
two separate tonal movements occur: phrase-finally, a rise takes place, aligned on 
the penult of the phrase-final word, like a rising pattern in the word penult-variant. 
Additionally, a rising-falling movement occurs on the syllable bearing word stress 
in Spanish. This has been labeled as a LH* pitch accent, like in Huari Spanish, with 
the following low coming from what is ususally the phrase-initial L tone of Quechua 
(214). 


“inherited” pattern 
(213) 


abejakunapa (abeja-kuna-pa / bee-PL-GEN / *by the bees") 


*grafted" pattern 
(214) 
LH* L-—H-» 
abejakunapa 
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The marked loanword variants are not categorical on Spanish loanwords (cf. section 
6.1.8.3). More frequently, loanwords from Spanish are realized with the same into- 
nation as native Quechua words, i.e. either with tonal alignment according to the 
word boundary-variant or the word penult-variant, instead of aligned according 
to the position of the stressed syllable in the Spanish word. The incidence of the 
marked loanword variant (“inherited” and “grafted” together) differs greatly both 
between speakers and between lexical items. The highest rate of its use was found 
to be nearly 60% for one speaker in one of the experiments; most other speakers 
were found to exhibit far lower ratios, some not producing a marked token at all 
despite using several Spanish loanwords. Overall, Spanish loanwords are realized 
with a marked variant in somewhat less than a fifth of all cases, with nearly half 
not aligned according to the Spanish word stress, and the remaining third indeter- 
minable. Amongst the same speakers, different lexical loanword items exhibited 
considerably different rates of being realized with the marked variants, suggesting 
that lexical identity also plays a role. 


7.1.3 The role of prominence and information structure 


Regarding the question of how meaning-related categories, especially IS, are cued 
by Quechua prosody (question (35)), this study has found that this is mostly achieved 
syntagmatically. In terms of which word will be realized with a high tone in a mul- 
ti-word phrase, the findings suggest that this is due to which word is most promi- 
nent in such a phrase. Intonational cues for prominence and their relation to infor- 
mation status and structure were investigated on a subset of nominal sequences. In 
phrases with the rising and rising-falling pattern, final prominence was assumed 
to be cued only when the entire high stretch was within the final word. In only-fall- 
ing phrases, final prominence was assumed to be cued only when the tonal tran- 
sition occurred on the penult in the final word. If a nominal sequence extended 
across several phrases, prominence was assigned to the phrase with the highest 
excursion only if all phrases were of the same tonal pattern. This was compared 
with the relative information status (givenness/newness) of the individual words 
in the sequence and their information structure (focus/background) as determined 
from an analysis of the discourse context. The results support the hypothesis that 
prominence is final by default, as in Spanish. They also suggest that a deviation 
from this default is a marked strategy for signaling prefinal narrow focus or prefi- 
nal newness followed by given material (cf. section 6.2.3.3). The relation between 
phrasal prominence and information structure does not seem categorical, and 
instead probabilistic/distributional, as we saw to be the case for Spanish from the 
literature, and also for Cuzco Quechua (cf. the discussion in section 3.7.3). The rela- 
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tion between prosodic cues and phrasal prominence itself certainly needs further 
research. Assigning prominence across several phrases of the same tonal pattern 
according to highest excursion implies the assumption of a larger prosodic unit 
encompassing those phrases among which prominence is culminative. 

The relation between information structure, prosody, and morphosyntax (word 
order and the use of the so-called evidential suffixes and -qa, the topic suffix) was 
also further investigated in a qualitative analysis ofindividualutterancesintheir con- 
versational contexts (section 6.4), to complement the broad results of the preceding 
quantitative analyses (section 6.2.3.3). The findings from this analysis support the 
suggestion that tonal scaling at the level of local pitch span serves to demarcate 
larger prosodic units encompassing several PhPs, in that prefinal PhPs are realized 
with comparatively smaller pitch span than the final PhP. Such larger prosodic units 
with the greatest excursion on the final PhP were regularly found on utterances in 
broad focus contexts in the narrative task Cuento (section 6.4.1), again supporting 
the assumption that default prominence is rightmost, also at that higher prosodic 
level. No evidence was found that such a larger phrasal unit comes with its own sep- 
arate boundary tones, so that in answer to questions (34)c and (35)d it is suggested 
that the PhP is recursive in Quechua and itself the largest unit. The examples ana- 
lysed there suggest that phrasing and scaling of these larger phrasal units interact 
with syntax to signal discourse coherence. An analysis of examples from Maptask, 
for which context suggests a more complex utterance-internal information struc- 
ture, supports the hypothesis that prosody cues information structural roles indi- 
rectly. Constituents interpretable as focal from the context were found to be most 
often aligned with right boundaries of PhPs. The findings suggest that a reduction in 
scaling of pitch level (downstep) from one phrase to the next cues a deviation from 
rightmost prominence, so that the last higher scaled phrase has highest prominence. 
This was found to occur in contexts that suggest an information structural separa- 
tion such that the boundary between the last higher scaled phrase and the down- 
stepped one aligns with the division between at-issue material and backgrounded 
material. These findings support the relation between material aligned with a high 
tone and prominence from section 6.2.3.3 and extend it to units larger than indi- 
vidual PhPs. Contour shape is also shown to cue information structure indirectly 
in accordance with the proposal made in section 6.2.2 that rising contours signal 
openness and falling contours completeness. The findings in this section were put 
into relation with proposals for a pattern or template by Weber (1989) and Sanchez 
(2010), which connects information structural roles with positions and morphologi- 
cal marking in a sentence, extending those proposals from a prosodic perspective. 

In section 6.4.3, these findings were further complemented by an analysis 
which compared the individual contributions to cueing information structure 
from prosody, morphology, and word order against what is interpretable from the 
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context using a number of examples from the Conc corpora. It was shown that the 
cues from different domains are quite often in conflict with each other, and sug- 
gested that this might reflect that the information structural roles of referents in 
Conc are underspecified by the notions of topic-comment and focus-background. It 
was found that scaling as cue to prominence is probably quite reliable, even when it 
goes against contour type or word order. The picture of the relation between infor- 
mation structure and the cues to it from different formal domains that emerges is 
problematic for a view which takes information structural roles to be formal fea- 
tures that are categorically expressed via some kind of marking, and far more con- 
sistent with a view in which information structure is cued only indirectly and prob- 
abilistically, as laid out in the theoretical considerations in section 3.7.3. It enhances 
this view by exploring how each of the domains cueing information structure is 
subject to formal restrictions particular to that domain, and how the cues from the 
different domains can be seen to align and misalign to convey complex informa- 
tion structural configurations. Methodologically, it would not have been possible 
to make these observations if the principled contextual perspective on information 
structure from Riester et al. (2018); Riester & Shiohara (2018); Riester (2019) had not 
been adopted to serve as a third of comparison. 


7.2 Huari Spanish result summary 


7.2.1 Tonal inventory and encoding of pragmatic meaning via paradigmatic 
contrasts in Huari Spanish 


The analysis of Huari Spanish has found a comparatively small inventory of para- 
digmatic tonal contrasts. Even though declaratives and interrogatives in pragmati- 
cally diverse discourse contexts were investigated, it seems likely that only a single 
bitonal pitch accent LH* is used in all of them. It is realized via a low elbow in the 
pretonic syllable, followed by a peak early or in the middle of the tonic vowel. This 
LH* pitch accent is thus regularized to prenuclear and nuclear position in declara- 
tives, neutral and biased polar questions, alternative questions, and wh-questions, 
in marked difference to what has been described for many other Spanish varieties, 
including the “Andean Spanish" of Quito in Ecuador (cf. O'Rourke 2010; Hualde & 
Prieto 2015), but more similar to the findings on Cuzco Spanish by O'Rourke (2005). 
This study did not systematically investigate a number of pragmatic meanings that 
have been previously found to exert an influence on prosodic form, like e.g. imper- 
atives and vocatives, so any conclusions about the full paradigmatic inventory of 
tones must be preliminary. However, we can compare the variation in nuclear con- 
figurations in Huari Spanish with those of Table 3 for Peninsular Spanish across 
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the same range of meaning, i.e. statements with finality, neutral polar questions 
and requests with confirmation bias (corresponding to a, b, f, g, in Table 3). Within 
this range, Huari Spanish was found to have only two nuclear configurations (LH* 
L% and LH* LH% under analysis II in section 5.1.2.1), while Peninsular Spanish 
has four (LH* L%, L* L%, L* H96, LH* HL96).^? The only non-phonological context 
tentatively identified as changing pitch accent realization was when an utterance 
carried an additional non-at-issue meaning of challenging a salient presupposition 
in the context, in both assertions and clarification questions. In those cases, both 
the elbow preceding the peak and the peak itself on the constituent denoting the 
contested referent were considerably delayed. This could possibly be identified as 
a pitch accent different but derivable from LH*, like L*H or L*«H*, but further 
research is needed. 


Table 50: Interactional meanings/functions and their encoding via formal means that include 
intonation in Huari Spanish and Quechua. 


pragmatic function Spanish Quechua 

continuation LH* H- LH 

finality LH* L96 (DHL 

declarative LH* L% LH/(L)HL 

neutral polar question LH* LH% -ku LH/(L)HL 

confirmation-seeking question LH* * tag LH96 KOHL (-ku)]4 [LH tag] 
(tag e.g. no) (tag - aw) 

clarification request for intended LH* L% LH 

content with confirmation bias 

question with [alternative 1] [(LH*) LH* H-]ip [LH -kulo 

[alternative 2] (alternative question) — [(EH*)EH* L%];p I(L)HL -kulę 

wh-question [LH*wn-wora (LH*)]ip ([EH*]ip) L% «+ « KLH L]o wh-word - + - 

presupposition reversal delayed peak on LH* L% ? 


The boundary tone inventory was also found to be comparatively small. Under the 
analysis which assumes an LH* pitch accent for all types of declaratives and inter- 
rogatives considered (cf. section 5.1.2.1), three boundary tones were identified, two 
monotonal ones (H-/% and L-/%), and a bitonal one (LH-/%). Table 50 summarizes 
the findings on how different pragmatic meanings apart from information struc- 


449 According to O'Rourke (2010: 231-242), “Ecuadorean Andean Spanish" (based mostly on 
speech from Quito), has five different nuclear contours in this meaning range: L*L%, LH*L%, 
L*HH%, LH* HH%, L* HL%. That once again puts the homogeneity of what has been called “Ande- 
an" Spanish into question (cf. section 2.2). 
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tural separations are encoded via formal means that at least include intonation in 
Huari Spanish and Quechua. 

Table 50 shows that purely on the basis of paradigmatic choice between what 
could be called nuclear configurations (cf. research question (35)a), Huari Spanish 
only has a three-way distinction. The differentiation between neutral and con- 
firmation-seeking polar questions is achieved by the addition of a tag, while that 
between declaratives (assertions) and clarification requests with confirmation bias 
is likely purely achieved by context. Quechua makes even less use of paradigmatic 
tone contrasts, effectively only differentiating between continuation and finality 
(the results on the types of biased questions being rather preliminary). Instead, 
further morphosyntactic means like the polar question suffix —ku are employed, 
but context here also plays a crucial role. 


7.2.2 Syntagmatic intonational contrasts: Deaccentuation and phrase 
accentuation 


In contrast, syntagmatic intonational contrasts across utterances (cf. research ques- 
tion (35)b) were found to play a very important role in Huari Spanish. On the one 
hand, deaccentuation after the accent with the strongest metrical position is used 
to differentiate between the two alternatives in alternative questions in almost all 
cases and also very often some point after the wh-word in wh-questions (cf. Table 
50). Deaccentuation was also found to occur on the noncontested material in partial 
denials/reversals. It did not occur on material that is only given following new 
material without a reversal. But it also occurred on other non-at-issue material fol- 
lowing at-issue material, like postposed evaluatives or parentheticals. On the other 
hand, a marked accentuation mode labeled phrase accentuation was observed. On 
a larger prosodic unit containing several accentable prosodic words, only the final 
one receives a full pitch event, either in the form of an LH* pitch accent on the 
final stressed syllable (phrase accentuation J), or as a rising boundary tone (LH) at 
the right edge of the phrase (phrase accentuation II). This contrasts with previous 
descriptions of Spanish that assume a pitch accent on virtually every accentable 
(content) word (Hualde & Prieto 2015: 358) and also with what can be assumed 
to be the default or main accentuation pattern in the Huari Spanish data, where 
a pitch accent was found on nearly 90% of all content words in one sample (cf. 
section 5.1.1.2). It is much more similar to English or German, where a pitch accent 
usually only occurs on the most prominent prosodic word in a phrase. The phrase 
accentuation was found to occur in contexts in which it is plausible to assume an 
information structural partition such that the phrase which only has a final pitch 
event is either not internally partitioned in a relevant way (broad focus), or par- 
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titioned with narrow focus on the final element, both of which correspond to a 
metrical structure with rightmost prominence. It is thus phrase-optimizing in the 
sense that it makes the pitch event culminative in a phrase, and again similar to 
German or English in that this pitch event is located on the most prominent word. 
A third marked accentuation pattern was found only marginally. It is a plateau-like 
realization whereby pitch remains at a high level across a number of prosodic 
words in a phrase after rising on the first accentable syllable. This pattern is a rare 
variant and was systematically employed only by few speakers. Huari Spanish was 
thus found to exhibit two complementary syntagmatic intonational strategies for 
marking prominence contrasts that can be exploited for cueing relevant informa- 
tion structural partitions: deaccentuation, which compresses postnuclear pitch 
accents and thus can be used to cue at-issue — non-at-issue partitions, and phrase 
accentuation, which compresses prenuclear pitch accents and can be used to cue 
non-at-issue — at-issue partitions. They have in common that they impose a parti- 
tion on an utterance based on postlexical metrical structure, and marking out parts 
of that utterance as coming before (phrase accentuation) or after (deaccentuation) 
the most prominent element. These accentuation modes both go at the expense 
of word-optimizing prosody. The difference between phrase accentuation I and II 
could in addition be described as moving between head-marking and edge-mark- 
ing, in Jun (2005d, 2014b)'s typology. The phrase accentuation was not only found 
as a marked mode for the cueing of information structural contrasts, but for some 
speakers, it also seemed to constitute a kind of default. This was found to have con- 
sequences for how the prosodic units that structure utterances are cued. 


7.2.3 Hierarchical scaling contrasts and recursive prosodic structure 


Tonal scaling was found to play an important role in Huari Spanish intonational 
phenomena (cf. research question (35)d). Firstly, it is an essential part of the reality 
of the two syntagmatic intonational phenomena phrase accentuation and deaccen- 
tuation. The nonprominent pitch accent positions that are suppressed in them were 
found to exhibit variable degrees of tonal compression, ranging from moderate, but 
with still notably less local excursion than the pitch event at the prominent posi- 
tion, to total reduction, with no identifiable local excursion remaining. 

Secondly, tonal scaling was found to be employed systematically in Huari 
Spanish for the cueing of hierarchical prosodic structure in a set of complex utter- 
ances. They were all double-topic constructions, consisting of two sequences of 
topic-comment each (cf. section 5.2). Tonal scaling is inherently a relative phenom- 
enon, and the scaling of a local pitch event is relative with reference to at least two 
different values: the scaling ofthe utterance in which it occurs and other local pitch 
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events in that utterance, and the usual range of a speaker in a given social context. 
It therefore lends itself quite naturally to the cueing of prominence, which is also 
inherently relative. Tonal scaling on pitch accents was found to be significantly 
affected by whether a pitch accent was part of the first topic, the first comment, 
the second topic, or the second comment, with the first topic scaled higher than the 
first comment, and the second topic higher than the second comment. Crucially, the 
scaling difference between first comment and second topic was smaller on average 
than that between those other groups, but first comment and second topic were 
clearly separated by a boundary tone in most cases. These findings are incompat- 
ible with a simple local downstep model of tonal scaling. Instead, the number of 
pitch levels distinguished, the domain across which a local level is maintained, 
and the scaling difference that is sensitive to the type of utterance subpart it sepa- 
rates all suggest that tonal scaling here cues a hierarchical prosodic structure with 
several levels of embedding. This prosodic structure is subject to constraints that 
seek to faithfully map it to the information structural units involved on the one 
hand, and to ones seeking to reduce the number of its levels, on the other. It was 
found that only with a sufficient degree of length and complexity of the utterances 
involved did the first of these win out over the second, suggesting that genuinely 
rhythmic constraints play a crucial role. It was argued that the prosodic structure 
that best models these phenomena is recursive at the IP-level, to my knowledge the 
first time that this has been claimed on empirical grounds for Spanish. 

In the discussion of a set of examples by three speakers that deviated in their 
scaling behaviour, it was shown that they employed phrase accentuation in these 
utterances. While this had the consequence of reducing the number of contrasts 
that could be achieved via local pitch accent scaling, it was shown that they use 
other cues, including boundary tones and their scaling as well as systematic dif- 
ferences in the phrasing of the constituents of the first and the second topic, to 
signal effectively the same prosodic and metrical structure. They were also the only 
Speakers found to make occasional systematic use of the plateau-like realization, in 
that they used it on the second topic, while the first one was realized with individ- 
ual pitch events on each constituent. These findings suggest that different cues, in 
particular pitch level scaling and boundary tones can be equivalent for the signal- 
ing of prosodic structure. 


7.2.4 Relation between the observed variants 
In an OT-analysis, the relationship between the different accentuation variants 


(main accentuation and the two phrase accentuation variants) was explored. It 
could be demonstrated that the main accentuation and the first phrase accentua- 
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tion variant only differed with regards to the number of minimal tone sequences 
of LH that are assigned per phonological phrase: while the main accentuation has 
one LH sequence per prosodic word in the PhP, the phrase accentuation has one 
per PhP. The different phrase accentuation variants, in turn, differed with regards 
to the ranking of their alignment and association constraints. The H tone that is the 
starred tone of a pitch accent in the main accentuation and phrase accentuation I 
is a boundary tone aligned with the right edge of the phonological phrase in phrase 
accentuation II. The plateau-like realization was shown to result from a rerank- 
ing of two constraints in the phrase accentuation. The flexible nature of the tones 
involved was further demonstrated when IP-level boundary tones were consid- 
ered: in IP-final position does the H tone that is an edge tone in non-final PhPs in the 
second phrase accentuation variant become the starred tone of a pitch accent again 
at least in final paroxytones, because it is pushed to that position by the encroach- 
ing IP-boundary tone. This result, as well as the findings on the variants observed 
for Quechua, supports the hypothesis that tones are at some level autonomous and 
independent of their relations with the segmental chain (cf. sections 5.3, 6.3), and 
in principle this implies consequences for the convention in much of the literature 
of giving the intonational inventory of a language as a list of paradigmatical choices 
of nuclear configurations in which theses relations are fixed. Yet the fact remains 
that tonal alignment differences can also encode differences in pragmatic meaning, 
in Huari Spanish as elsewhere (cf. section 5.1.3.3, Fliessbach 2023), but not in Huari 
Quechua, as far as we can tell. This poses an empirical and theoretical challenge 
for the future. 


7.3 Conclusions and outlook 


After the main results for each language have now been summarized, we are now 
in a position to synthesize them. The two analytical chapters 5 and 6 aimed at 
answering the first two main research questions, with in particular the OT-anal- 
yses also paving the way for an answer to the third main question of comparing 
Huari Spanish and Quechua prosody, providing the two halves necessary for it, as 
it were. In these conclusions, I want to reap the fruits of labour of the foregoing 
pages and fully focus on what consequences the results have for the third research 
question. I will integrate the results both from a typological perspective and from 
one that treats them as evidence of a shared prosodic grammar in a multilingual 
community. 
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7.4 Bringing Spanish and Quechua together 


In order to discuss what aspects of Huari Quechua and Spanish prosody are lan- 
guage-specific, and which are shared, I first want to briefly return to the OT-analy- 
ses. The descriptions and analyses in the preceding chapters have shown that bilin- 
gual Huari speakers employ a range of intonational variants both in their Spanish 
and their Quechua. Formulated in terms of constraint rankings, it becomes clear 
that considerable similarities exist between them that cross the boundary between 
the two languages. (215) and (216) repeat the alignment constraints of what I have 
called the word-penult and the word-boundary variant of Quechua rise-falls, while 
(217) gives the ranking for an IP with phrase accentuation of Spanish as exemplified 
by NQ01, with all association constraints removed. This is done to facilitate compa- 
rability, but in fact, as pointed out at the end of section 5.3.2.3, insofar as the phrase 
accentuation variant of Spanish is concerned that is exemplified in utterances by 
speaker NQ01, the assumption of association is a somewhat theoretical matter 
anyway. LINEARITY, MAXIO(T), NOCROWD are ranked above all other constraints in 
the rankings here and left out. Between the phrase accentuation of Spanish and 
the word boundary variant of Quechua, there are some differences that are due to 
the fact that the tones in Quechua are all taken to belong to the PhP, whereas the 
Spanish tones are a mixture of PhP- and IP-tones. In the Quechua ranking, there are 
also constraints referring to the boundary of the prosodic word that do not have 
a counterpart in the Spanish ranking, and the constraint aligning the H with the 
stressed syllable, ALIGN (H, o), is split up in two constraints separately directing 
alignment with its right and left edges, ALIGN (H, o’, Lt) and ALIGN (H, o, Rt). Then 
there are actual differences in ranking, such as the relative position of ALIGN (L, 
Lt;) and ALIGN (L, Rtg) in Spanish and their Quechua counterparts ALIGN (Lor; Lto) 
and ALIGN (Lor; Rtg). 


(215) Alignment constraint ranking for the word-penult variant of Quechua (rise- 
falls) 

^ ALIGN (Lori Rtg) >> ALIGN (Lor; Lto) >> ALIGN 

oo 0' olo o c' g) (H, o, Lt) >> ALIGN (H, c, Rt) >> ALIGN (H, Rta) 

( | m page >> ALIGN (Loro Lto) >> ALIGN (H, Rtg) >> ALIGN 

' t$ (Loro Rtg) >> ALIGN (H, Lte) >> ALIGN (H, Lt) 


Ls HT jx >> ALIGN (H, Lt.) 
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(216) Alignment constraint ranking for the word-boundary variant (without 
multiple alignment of the H tone) of Quechua (rise-falls) 

^ ALIGN (Loro Rtg) >> ALIGN (Lor; Lto) >> ALIGN 

oo c olo o c' g) (H, Rt,) >> ALIGN (Leg, Lto) >> ALIGN (H, 

[ m TL Rt;) >> ALIGN (Lor; Rt) >> ALIGN (H, Lte) >> 

' : | $ ALIGN (H, 0, Lt) >> ALIGN (H, o, Rt) >> ALIGN 
e -— ^ (H, Lt,,) >> ALIGN (H, Lt,,) 


(217) Alignment constraint ranking for the phrase accentuation variant of Spanish 
(NQ01's variant) without association constraints, on paroxytones 
^ ALIGN (T, Rt) >> ALIGN (H, Rt;) >> ALIGN (H, 0) 
oo g' olo o o'c). >> ALIGN (L, Rt;) >> ALIGN (L, Ltẹ) >> ALIGN 
T P Arti h (H, Lto) >> ALIGN (T, Rt) 
1 (d$ 
< ' i 
I ' 


1 
i 


t 


r 


However, there is also considerable overlap: in both rankings, the H tone seeks to 
right-align and is blocked in that effort by the presence of an L whose right-align- 
ment has precedence; the H tone also seeks to left-align, but the rightwards push 
of the left L tone is stronger, so that the H forms a peak on the phrase-final penult. 
Phenotypically, both rankings can be said to converge in that they result in an 
effectively identical pitch contour of a phrase-final peaked rise-fall in phrases with 
rightmost prominence. For both rankings, there are also attested variants in which 
the constraint that right-aligns the L and the one left-aligning the H are reranked 
relative to each other (cf. sections 5.3.2.1 and 6.3). These result in the attested pla- 
teau-like realizations that are more frequent in the Quechua data than in Spanish 
but are also similar phenotypically. Thus, the plateau-like realizations are also a 
convergence point for at least two different rankings from the two languages. The 
phrase-final peaked rise-fall is also generated from the word-penult variant in 
Quechua, and the main variant in Spanish on paroxytones in IP-final words, so that 
there are at least four different rankings from the two languages that converge in 
this contour if only the final word is considered. Equally, the ranking for Quechua 
rises in the word boundary-variant (cf. sections 6.3.1.1, 6.3.2.1, 6.3.3) and that for 
the Spanish phrase accentuation in the variant exemplified by XJ45 and NQO1 (cf. 
section 5.3.2.3) is similar enough that they result in phenotypically very similar con- 
tours. Thus both in terms of phenotypical contour shape and in terms of important 
aspects of the constraint rankings, there is a convergence for some of the described 
variants for both languages, while other variants are more divergent, independent 
of which language they belong to. 
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It is also clear, however, that both the underlying prosodic structures and the 
detailed phonetic realizations are not all the same. For one, section 6.1.6 showed 
that in the Quechua sample used there, peak alignment is affected by the relative 
syllable weight between the final two syllables independent of stress, a subtle 
interaction of quantity-sensitivity and tonal alignment not observed for Spanish. 
At a more structural level, for Quechua there has been no need in the analysis to 
propose more than one phrasal level in the prosodic structure that assigns tones, 
while for Spanish, both a smaller, PhP-like level and a larger IP-like one are taken 
to provide tones. Yet for both languages, there is some evidence that the highest 
level is recursive. Some pitch contour phenotypes are only attested for one of the 
languages, notably the *main" variant of Spanish and the only-falling contour of 
Quechua. Thus I do not suggest that Quechua and Spanish are *the same" at a rele- 
vant level of representation. Instead, I think that by applying an OT analysis to the 
observed variants from both languages it has been possible to decompose them 
into more fundamental variable properties that can assume at least two different 
values. For some of these properties, all values are shared, i.e. attested in both lan- 
guages, while for others one or more are specific to one of the two languages. In 
other words, the variation of some of these properties is orthogonal to the sepa- 
ration between the two languages, extending across both of them, while that of 
others is parallel to it, keeping it only to one of the languages. In the overall vari- 
ation space, these properties can assume converging or diverging configurations, 
which leads to the observed forms that are the result of such configurations being 
shared or language-specific. In the OT-analysis for each language I tried to model 
the constraint rankings in such a way that they are maximally specific for each lan- 
guage, making sure that their variation is consistent and plausible mainly within 
the language itself, by trying to reduce the number of changes from one variant to 
the next. In this way, what could be called the outer limits of the variation space 
were traced. 

However, that different constraint rankings result in effectively the same con- 
tours suggests that a number of intermittent, less peripheral ranking variants also 
exist, a large number of which is presumably available for both languages. In this 
way, what I have tried to describe is a part of the shared prosodic resources available 
to the Huari speaker community as a whole. This can be described as having a core 
of shared elements, both in terms of constraint rankings and contour phenotypes, 
which cannot really be said to belong to either language. More peripheral elements 
that are specific to one of the languages also form part of these resources. Saying that 
this is a partial description of the linguistic resources of the Huari speaker commu- 
nity does not imply that they are all available to all speakers equally. This is another 
way in which these resources can be thought of as having a centre and a periphery. 
Central in this sense are those resources that are available and frequently used by 
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all, while peripheral ones are those that are only available to some, and used infre- 
quently. Table 51 relates some of the variable prosodic properties in both languages 
to the variants identified in this work. In it, the variable properties are given with 
the different values they have been shown to assume in the analysis. Gray shading 
is used to contrast these different values. At the top, the separation between the two 
languages is marked. For nearly all of the properties given in the table, the separa- 
tion between the values does not coincide exclusively with the separation between 
the languages. At least for most of these points of prosodic variation, their variation 
space is not most usefully described by assigning one of the values to Spanish and 
the other to Quechua. So at least in this synchronic prosodic description, it is difficult 
to rigorously differentiate the data using the labels of Quechua and Spanish on a 
formal level.**° The different variants as clusters of these values are perhaps more 
aptly described as prosodic variants available to the Huari speaker community. 


7.4.1 Degrees of caring for stress in both languages and typological 
considerations 


The main variational parametres in the OT analysis of both Quechua and Spanish 
can all be related to different degrees of “caring” for a culminative, obligatory posi- 
tion specified at the word level (word stress), even though they all move along dif- 
ferent typological dimensions: alignment of tones with phrasal boundaries clearly 
serves delimitative, rather than culminative, functions. Alignment to word bound- 
aries also at most serves to distinguish more from less prominent material at the 
word level, but not at the syllable level. Whether the H tone spreads or not can 
also be linked to this functional difference, since if a H tone is spreading across 
several syllables, it automatically loses some of its potential to call attention to a 
culminative position of syllabic size at the word level. Adjusting the tonal contour 
around the “inherited” stress position in a Spanish loanword of course aids in 
cueing the lexical identity of the word via intonation. This function is entirely 
absent in native Quechua words in any variant, because the fully regular word 
stress on the penult cannot distinguish between lexical items. For Spanish, it is the 
default in the main accentuation variant. Finally, minimal tone sequence distri- 
bution also plays a crucial role: only if tones are assigned in sufficient numbers 


450 Questioning “named languages" as an analytical instrument to describe a synchronic linguis- 
tic reality, especially in multilingual contexts, is a central notion in the field of translanguaging 
(García & Li 2014; Otheguy et al. 2015) as well as others. That does not diminish their usefulness in 
historical contexts or as heuristic descriptors, or negate that they have a sociolinguistic reality for 
many speakers. 
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can they be used to regularly cue the word stress position. The variants in both 
languages effectively occupy all of these positions. On a cline going from optimizing 
a culminative, obligatory and lexically specified word stress position to optimizing 
phrasal units via delimiting tonal movements, the main accentuation variant of 
Spanish would be at one extreme. The “grafted” and then the “inherited” patterns 
of Quechua follow after, but at a definitive distance, even though they also cue a 
lexically specified, instead of a regular, syllabic position. This is on the one hand 
because in the “grafted” pattern the phrasal tones are expressed in addition to the 
pitch accent on the word stress, arguably substracting from the culminativity of 
the pitch accent, and because the “inherited” pattern can still form plateaus. On 
the other, their optionality and indeed infrequency (cf. section 6.1.8.3) means that 
the condition of obligatoriness is not even fulfilled for their restricted subset of the 
lexicon. Further towards the other end we would find the first phrase accentua- 
tion variant of Spanish, and the word-penult variant of Quechua, both with only 
one tone sequence per a PhP containing more than one prosodic word. Fully at 
the other extreme there are the second phrase accentuation variant of Spanish as 
exemplified by NQ01, and the word-boundary variant of Quechua without multiple 
H tone alignment, in both of which constraints making reference to a syllabic prom- 
inent position play no role anymore and tonal alignment is purely oriented towards 
phrase edges. The variants however differ with regards to how they are weighted 
in terms of frequency: while the main accentuation pattern is the default for Huari 
Spanish, the “grafted” and “inherited” patterns of Quechua are infrequent and 
marked because they can only occur on a subset of the lexicon. The word-penult 
pattern is likely the most frequent pattern of Huari Quechua, and the purely edge- 
based variants are not the default in either language, but probably more frequent in 
Quechua. Recall also that while the word-penult variant might be the most frequent 
one for Quechua, the difference between it and the more edge-based word-bound- 
ary variant only really emerges in a subset of configurations (cf. sections 6.1.2, 6.1.3, 
6.1.4). In addition, we have seen evidence throughout that Quechua often reduces 
penults via other prosodic parametres (duration, vowel quality) even if they bear 
the tonal transition. Thus even though both languages cover a wider range of pos- 
sibilities with regards to this intonational optimization of word stress or phrasing, 
all of Huari Quechua as a whole (especially when regarding frequency of variants) 
can indeed be said to “care” considerably less about stress than Huari Spanish, with 
the main accentuation of Spanish occupying a position in this respect that could not 
be observed to be reached by any of the Quechua variants. 

Also with regards to other typological parametres that were discussed in 
chapter 3.4.6, the findings have shown that the variants of both languages occupy 
positions covering a lot of the ground staked out by them. Huari Spanish in the 
main accentuation variant is comparable to Egyptian Arabic in that pitch accents 
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of the same regularized form (LH*) are realized on virtually each accentable word. 
In Jun (2014b)’s typology it would therefore be classed as head-prominent (marking 
prominence at the AP/PhP-level via pitch events on the heads of phrases), having 
word stress, and exhibiting strong (tonal) macro-rhythm because it produces the 
same rising pitch movements at intervals roughly corresponding to the length of a 
content word (cf. Jun 2014b: 526, 528). Huari Quechua in the word-penult variant 
marks both the heads of phrases (derived from the regular stress on the strongest 
word) and their edges, has word stress (even though it *cares" very little about 
it), and produces both falling and rising pitch movements on phrase heads spaced 
at intervals usually greater than prosodic words, with stretches of level tones 
in between. It is thus comparable perhaps to French or Japanese in Jun (2014b: 
531), but with a weaker macro-rhythm because the edge tones of the PhP/AP are 
more variable (both can be either L or H depending on the type of contour). With 
the less frequent variants, their positions along these parameters shift substan- 
tially further in both languages. Arguably, both the second variant of the Spanish 
phrase accentuation as well as the word boundary-variant of Quechua are only 
edge-prominence marking, and at least with the word boundary-variant with mul- 
tiple H tone alignment it is even questionable to class it as showing any evidence 
of having word stress, so that it is then most comparable to Korean (cf. Jun 2014b: 
532). On the other hand, a sequence of words produced in the *grafted" pattern of 
Quechua would have a stronger macro-rhythm than the word penult-variant. The 
point here is not so much that applying the parameters is difficult, even though we 
have seen throughout that classifying Huari Quechua as a language with or without 
word stress is a complicated issue. I think that these parameters do capture some 
useful and intersubjectively describable properties, but applying them to the dif- 
ferent variants of both Huari Quechua and Spanish puts in doubt that the objects 
to which these properties belong should really be taken to be “whole” languages. 
In Jun (2014b), different varieties of e.g. Japanese and Arabic that are spoken by 
geographically separate populations are also classified differently in the typology, 
but in our case, a large part of the spectrum is covered by variants of two languages 
spoken by the same speakers in the same locality. I think nevertheless that this 
doesn’t mean that prosodically all languages or varieties are essentially “the same”, 
just that these parametres perhaps are not fully apt to distinguish between lan- 
guages in situations of extensive multilingualism. 

Consider again pitch accent inventory and stress position optimization. Even 
though Jun (2014b) introduces them as independent typological parameters, her 
findings present what is effectively quite a strong correlation between the param- 
eter of prominence-marking (head, edge, or head/edge) and that of macro-rhythm: 
the edge-marking languages she lists all have strong macro-rhythm, and the head/ 
edge-marking languages have either strong or medium macro-rhythm. Only the 
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head-marking languages cover the full spectrum from strong to weak macro-rhythm, 
and whether a language has a large paradigmatic inventory of pitch accents is a 
decisive factor for it to have either medium or weak macro-rhythm. The property 
of word prosody (stress, lexical pitch accent, tone, or no word prosody) is also cor- 
related: as she points out (Jun 2014b: 530), languages with a lexical pitch accent that 
occurs only on a subset of the lexicon, including Tokyo Japanese, some varieties 
of Basque, and arguably also Huari Quechua with Spanish loanwords (cf. section 
6.1.8.2), rarely have more than a single pitch accent choice and thus are more likely 
to have a stronger macro-rhythm. In trying to explain this correlation in search of 
substantial typological parameters, we could speculate that a linguistic system can 
only be expected to reliably encode paradigmatic tonal differences at a non-periph- 
eral position if this position is sufficiently present in the mind of speakers — presum- 
ably lexicalized - that it is possible to generate expectations associated with it, i.e. 
that “something” must happen there. Only with such a degree of salience might it 
then be possible to exploit the expectations connected to this position by varying the 
kind of pitch event occurring there in a systematic relation to meaning (including 
by varying tonal alignment relative to the temporal pivot that such a position pro- 
vides). It seems that in systems where such a position is defined only in a subset of 
the lexicon, or is overall not very conspicuous (as in Huari Quechua), this threshold 
of salience for paradigmatic variation is not reached. This is of course a hypothesis 
as it stands and its claims need to be further corroborated empirically. However, it 
would point to the size of the paradigmatic tonal inventory and whether it encodes 
pragmatic meaning differences being one really substantial typological criterion. In 
this respect, should the findings about Huari Spanish and Quechua presented here 
hold up in future research, Huari Spanish could then be argued to be just as similar 
to Huari Quechua as e.g. to Peninsular varieties of Spanish. This is because even 
though Huari Spanish does have the possibility to express differences in pragmatic 
meaning beyond continuation and finality, this functional load is almost entirely 
born by the small boundary tone inventory (also via pitch accent alignment), since 
likely only a single pitch accent option is available. Thus this possibility is severely 
reduced compared to what has been described for Peninsular Spanish varieties 
(Hualde & Prieto 2015; Fliessbach 2023, cf. also section 3.4.3), in practice almost as 
much as in Huari Quechua, where it was not found (cf. Table 52). 


7.4.2 Outlook 
Table 52 resumes some further prosodic properties discussed in the analyses that 


have not been linked to any of the identified variants. The results in both tables 
should not be taken as definitive. What Table 51 asserts about the property of asso- 
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Table 52: Further points of prosodic variation between Huari Spanish and Quechua not specifically 
identified for any of the variants. 


Property Huari Spanish Huari Quechua 
Paradigmatic tone choice signals continuity/finality Y Y 
Paradigmatic tone choice alone signals additional pragmatic Y x 
meanings, e.g. utterance type 
Tone choice plus particles signal additional pragmatic Y Y 
meanings, e.g. utterance type 
Syntagmatic tone distribution signals utterance type Y x 
Syntagmatic tone distribution signals information structure (moreinphraseacc. v 
variant than in main 
variant) 
Alignment differences relative to a stressed position cue Y x 
additional pragmatic meanings 
Number of phrasal levels above the prosodic word 2 1 
Top level recursive Y Y 
Devoicing in word- or phrase-final syllables Y Y 
Devoicing leads to “displacement” of salient pitch movement x Y 
Interaction of syllable weight and tonal alignment independent * Y 
of stress 
Vowel reduction (centralization/devoicing) Y Y 
Vowel reduction sensitive to word stress Y x 
Deaccentuation/dephrasing/tonal compression Y Y 
Pitch scaling employed to express local prominence contrasts Y Y 
Pitch scaling employed to express nonlocal structure Y ? 


ciation is based on extrapolation from the results of the peak alignment measure- 
ments and assumptions in the literature. It is probable that the H tone in Quechua 
sometimes associates, and sometimes doesn’t, but itis unclear whether it does this 
variably in relation to the identified variants. In Table 52, the absence of a prop- 
erty for one language can only mean that in the data analysed here, no evidence 
for it was found. It should be clear that further research would be needed to con- 
solidate these results. Nevertheless, here as well there are properties shared by 
both languages, while others are specific to one of them. It is a separate question 
which of the shared ones are universal or at least very frequent in the languages 
of the world, and which are perhaps specific to the Huari speaker community. For 
example, the signaling of something like continuity and finality via intonation is 
probably frequent overall. In contrast, in the case of vowel reduction it is well- 
known that many European varieties of Spanish hardly every reduce vowels, so 
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this is clearly not a universal feature. However, some kind of vowel reduction has 
also been reported for varieties of both Spanish and Quechua from other regions 
of Peru (Delforge 2008, 2011; Crignis 2018). That would suggest that some form of it 
might be a feature crossing the boundaries between the two languages in a wider 
area. The proposal that a grammar of a multilingual speech community is made up 
of elements that are specific to one language, elements that are specific to another, 
and elements that are community-specific, is also made in Hóder (2014a, 2014b, 
2018) in a construction grammar-based approach. This does not preclude that some 
ofthe community-specific features are potentially more broadly universal. Hóder's 
approach rejects *the notion of pre-existing languages' in the sense of distinct 
language systems" (Hóder 2014b: 216). He argues that the difference between bi- 
and multi-lingualism is gradual and that multilingualism in the sense of a speaker 
having command over several sociolinguistically differentiated varieties is ubiqui- 
tous. Grammar as a set of such socially distinct variants made up of specific values 
that communally relevant variables can assume is then seen as specific to a speaker 
community. Crucially, observable linguistic differences within a community are 
taken to be socioindexically meaningful, and the dynamic negotiation of these 
socioindexical meanings is the driving engine for conventionalization and change 
(Hóder 2014b: 225, 2018: 44). In the present analysis, we have come to identify a 
number of variants that are specific to either language or the speaker community 
to different degrees. The results presented here demonstrate that the concept of a 
repertoire of shared grammatical resources for a multilingual community is mean- 
ingfully applicable also in the case of prosody and intonation and even when the 
two languages in question would normally be considered to be both genetically and 
typologically distant from each other. What has only been touched upon in passing 
is the social meaning of using these variants in the Huari speaker community. It 
has become clear throughout that even within the same elicitation task, there is a 
considerable degree of speaker-specific variation. This relates not only to the pref- 
erential use of the identified intonational variants in either language, but also to 
other areas of grammar and lexical choice, as well as more global prosodic stylistic 
choices such as speaking with a comparatively small pitch range. In a sense, this 
Work has only been the first step of inspecting the inventory in terms of exploring 
the Huari speaker community. I began it by saying that it is a description of the 
prosody of two varieties of Spanish and Quechua spoken by the same speaker com- 
munity. Perhaps a more accurate statement would have been that it is a description 
of the prosody of that community. It is a task for the future to truly relate its varia- 
bility to community-specific sociopragmatic meanings. 


Appendix A - Weblinks to the audio files 


Chapter 3 

(31) https://osf.io/d7tzc/ 
Chapter 5 

Figure 20 https://osf.io/vyxtp/ 
Figure 21 https://osf.io/ypmb4/ 
Figure 22 https://osf.io/87j56/ 
Figure 23 https://osf.io/hv4ab/ 
Figure 24 (left) https://osf.io/z8vkq/ 
Figure 24 (right) https://osf.io/fxujy/ 
Figure 25 https://osf.io/4685c/ 
Figure 26 https://osf.io/r73wd/ 
Figure 27 https://osf.io/g5fs8/ 
Figure 28 https://osf.io/bhf3e/ 
Figure 29 https://osf.io/tr5j6/ 
Figure 30 https://osf.io/3w6sb/ 


Figure 31 (top) 
Figure 31 (bottom) 


https://osf.io/cke6p/ 
https://osf.io/vabz2/ 


Figure 32 (top) https://osf.io/rezpf/ 
Figure 32 (bottom) https://osf.io/z7eb2/ 
Figure 33 https://osf.io/tg5r2/ 
Figure 34 https://osf.io/cp8uk/ 
(41) https://osf.io/s8vdr/ 
Figure 35 https://osf.io/zjfv3/ 
Figure 36 https://osf.io/8ph9d/ 
Figure 37 https://osf.io/nyqjr/ 
Figure 38 https://osf.io/e97u3/ 
Figure 39 https://osf.io/3v8wy/ 
Figure 40 https://osf.io/hvw8e/ 
Figure 41 https://osf.io/q5f3z/ 
(43) https://osf.io/jfk4a/ 
Figure 42 https://osf.io/gby2a/ 
(44) https://osf.io/rd3vm/ 
Figure 43 https://osf.io/bu9tw/ 
Figure 44 https://osf.io/qmtew/ 
Figure 45 https://osf.io/xhdu7/ 
Figure 46 https://osf.io/gt423/ 
Figure 47 https://osf.io/cdrwf/ 
Figure 48 https://osf.io/39h2e/ 
Figure 49 https://osf.io/r4gsd/ 
Figure 50 https://osf.io/ht96p/ 


ð Open Access. O 2024 the author(s), published by De Gruyter. 


[G)) ev-c-wo | 


This work is licensed under the Creative 


Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 


https://doi.org/10.1515/9783111304595-008 


530 — Appendix A - Weblinks to the audio files 


Figure 51 https://osf.io/detz6/ 
Figure 52 https://osf.io/pcn4b/ 
Figure 53 https://osf.io/8geu6/ 
(48) https://osf.io/vn2c4/ 
Figure 55 https://osf.io/u2bzp/ 
Figure 56 https://osf.io/xgjkr/ 
Figure 57 https://osf.io/y75en/ 
Figure 58 https://osf.io/m47pq/ 
Figure 59 https://osf.io/zudme/ 
(50) https://osf.io/st6d8/ 
(51) https://osf.io/xp67y/ 
(52) https://osf.io/nvc23/ 
(53) https://osf.io/uqptz/ 
Figure 60 https://osf.io/mx2k7/ 
Figure 61 https://osf.io/9c6nt/ 
Figure 62 https://osf.io/eut2x/ 
Figure 63 https://osf.io/h6wjb/ 
Figure 64 https://osf.io/ayp93/ 
Figure 65 https://osf.io/7znmr/ 
Figure 66 https://osf.io/m7vuf/ 
(56) https://osf.io/udz93/ 
Figure 67 https://osf.io/uqa68/ 
(57) https://osf.io/qnjp6/ 
Figure 68 https://osf.io/vxceg/ 
Figure 69 https://osf.io/skfvr/ 
(58) https://osf.io/ntxpa/ 
(59) https://osf.io/Axtd2/ 
Figure 70 https://osf.io/vun95/ 
Figure 71 https://osf.io/vht2m/ 
Figure 72 https://osf.io/j2mpv/ 
(60) https://osf.io/9fkwu/ 
Figure 73 https://osf.io/xdz3q/ 
Figure 74 https://osf.io/dkawc/ 
(61) https://osf.io/pu9jg/ 
Figure 75 https://osf.io/evu35/ 
(62) https://osf.io/94w2h/ 
Figure 76 https://osf.io/8gk74/ 
(63) https://osf.io/cgk7u/ 
Figure 78 https://osf.io/65yzm/ 
Figure 80 https://osf.io/hp4fm/ 
Figure 82 https://osf.io/x9d4r/ 
Figure 84 https://osf.io/uzwa4/ 
Figure 87 https://osf.io/ugehf/ 
Figure 89 https://osf.io/qskvf/ 
Figure 91 https://osf.io/bema7/ 
Figure 94 https://osf.io/rg8zh/ 


Figure 95 https://osf.io/c3uv9/ 


Figure 98 

Figure 99 

Figure 102 
Figure 103 
Figure 104 
Figure 106 
Figure 107 
Figure 111 
Figure 112 
Figure 113 
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Figure 114 
Figure 115 
Figure 116 
Figure 117 
Figure 118 
Figure 119 
Figure 120 
Figure 121 
Figure 122 
Figure 123 
Figure 124 
Figure 125 
Figure 126 
Figure 127 
Figure 128 
Figure 129 
Figure 131 
Figure 132 
Figure 133 
Figure 134 
Figure 135 
Figure 136 
Figure 138 
Figure 139 
Figure 140 
Figure 141 A1 
Figure 141 A2 
Figure 141 A3 
Figure 141 B1 
Figure 141 B2 
Figure 141 B3 
Figure 142 A1 


https://osf.io/4seky/ 
https://osf.io/32k64/ 
https://osf.io/xt2sb/ 
https://osf.io/4n6ur/ 
https://osf.io/67ezm/ 
https://osf.io/h4wgn/ 
https://osf.io/sb3v9/ 
https://osf.io/d46st/ 
https://osf.io/rv8u5/ 
https://osf.io/uxenr/ 


https://osf.io/acp7y/ 
https://osf.io/mfbse/ 
https://osf.io/rquxg/ 
https://osf.io/y97cx/ 
https://osf.io/xe759/ 
https://osf.io/g3f56/ 
https://osf.io/bruj9/ 
https://osf.io/8tybp/ 
https://osf.io/ujxny/ 
https://osf.io/dvc76/ 
https://osf.io/wrjpn/ 
https://osf.io/m4ec2/ 
https://osf.io/bgxzj/ 
https://osf.io/yxzuw/ 
https://osf.io/9mbu3/ 
https://osf.io/8um7v/ 
https://osf.io/ka8zq/ 
https://osf.io/pw2ad/ 
https://osf.io/3c2xa/ 
https://osf.io/Ssfym/ 
https://osf.io/wj9cv/ 
https://osf.io/dj8us/ 
https://osf.io/z7x3s/ 
https://osf.io/q83es/ 
https://osf.io/g3x84/ 
https://osf.io/yuztx/ 
https://osf.io/jb68t/ 
https://osf.io/zx4dp/ 
https://osf.io/uks6e/ 
https://osf.io/fAycp/ 
https://osf.io/2z5wq/ 
https://osf.io/cqr9m/ 
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Figure 142 A2 https://osf.io/u9ntz/ 
Figure 142 A3 https://osf.io/f98ph/ 
Figure 142 B1 https://osf.io/fsd43/ 
Figure 142 B2 https://osf.io/atdur/ 
Figure 142 B3 https://osf.io/cjf3s/ 
Figure 143 1 https://osf.io/cn9aj/ 
Figure 143 2 https://osf.io/Abkp8/ 
Figure 143 3 https://osf.io/cnw9x/ 
Figure 143 4 https://osf.io/cAd2b/ 
Figure 144 1 https://osf.io/8kjng/ 
Figure 144 2 https://osf.io/h3rjy/ 
Figure 144 3 https://osf.io/ugahr/ 
Figure 144 4 https://osf.io/s6mtg/ 
Figure 1445 &6 https://osf.io/v9epg/ 
Figure 145 1 https://osf.io/fe78n/ 
Figure 145 2 https://osf.io/rqj7e/ 
Figure 145 3 https://osf.io/t98v2/ 
Figure 145 4 https://osf.io/8d4jt/ 
Figure 146 1 https://osf.io/qxm84/ 
Figure 146 2 https://osf.io/asqex/ 
Figure 146 3 https://osf.io/ga634/ 
Figure 146 4 https://osf.io/efqnp/ 
Figure 147 https://osf.io/mxd5q/ 
Figure 148 https://osf.io/veh3q/ 
Figure 149 https://osf.io/rS6wf/ 
(126) https://osf.io/etny4/ 
Figure 155 https://osf.io/zmbpk/ 
Figure 156 https://osf.io/z4re9/ 
Figure 157 https://osf.io/9u4s6/ 
Figure 158 https://osf.io/e2b4y/ 
Figure 159 https://osf.io/3ejw9/ 
Figure 160 https://osf.io/judpy/ 
Figure 161 https://osf.io/qbuk3/ 
Figure 162 https://osf.io/rcqxa/ 
Figure 163 https://osf.io/fr2py/ 
Figure 164 https://osf.io/g6qux/ 
Figure 165 https://osf.io/wq9s5/ 
Figure 166 https://osf.io/kh29b/ 
Figure 167 https://osf.io/unpge/ 
(144) https://osf.io/tay7p/ 
Figure 168 https://osf.io/9fw5u/ 
Figure 169 https://osf.io/hrxyq/ 
(145) https://osf.io/rnq5y/ 
(146) https://osf.io/5j46n/ 
Figure 170 https://osf.io/wsb8y/ 
Figure 171 https://osf.io/Aaqbd/ 


Figure 172 https://osf.io/8nvrq/ 


Figure 173 
(183) 
Figure 174 
Figure 175 
(184) 
Figure 176 
Figure 177 
(185) 
Figure 178 
Figure 179 
(186) 
Figure 180 
(187) 
Figure 181 
Figure 182 
Figure 183 
Figure 184 
Figure 185 
Figure 186 
(194) 
(195) 
Figure 187 


https://osf.io/uxqtc/ 
https://osf.io/m397j/ 
https://osf.io/dmf6u/ 
https://osf.io/9h8u3/ 
https://osf.io/s9kqr/ 
https://osf.io/4as7m/ 
https://osf.io/wur83/ 
https://osf.io/pcu9b/ 
https://osf.io/ptjnu/ 
https://osf.io/8v7rn/ 
https://osf.io/bg78v/ 
https://osf.io/ajm2n/ 
https://osf.io/jg2z8/ 
https://osf.io/7g3vk/ 
https://osf.io/xn256/ 
https://osf.io/4ahgqe/ 
https://osf.io/w5kzy/ 
https://osf.io/bhqxy/ 
https://osf.io/28nq9/ 
https://osf.io/z6akq/ 
https://osf.io/3r5x8/ 
https://osf.io/6jfz4/ 
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Appendix B - Maptask maps 


Maptask maps as used by TP03 & KP04 MT Q (section 6.4.2) 


Figure 188: Maptask map without the path as used by TP03 in TP03 & KP04 MT Q. 


@ Open Access. © 2024 the author(s), published by De Gruyter. [C9EIZITSITHN| This work is licensed under the Creative 
Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
https://doi.org/10.1515/9783111304595-009 
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Figure 189: Maptask map with the path as used by KP04 in TP03 & KP04 MT Q. 


Maptaskmaps — 537 


Maptask maps as used by ZR29 & HA30 MT ES, S039 & MD40_ 
MT ES, and TP03 & KP04 MT ES (section 5.1.3.3), and SG15 & 
QF16 MT ES (section 5.1.3.2.1 and 5.1.3.2.2) 
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Figure 190: Maptask map without the path as used by HA30, MD40, KP04, and QF16 in ZR29 & HA30_ 
MT. ES, S039 & MD40_MT_ES, TP03 & KP04_MT_ES, and SG15 & QF16 MT ES. 
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Figure 191: Maptask map with the path as used by as used by ZR29, S039, TP03, and SG15 in ZR29 & 
HA30, MT. ES, 5039 & MD40 MT ES, TP03 & KPO4 MT ES, and SG15 & QF16 MT ES. 
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Script used in section 5.2.3 


EHEHEHEH HEHEHEHEHE HEHEHEHEHE HEHEHEHE HEHEHEHEHE HEHEHEHEHE HEHEHEHEHE HEHEHEHEHE HEHEHEHEHE HEHEHEHE HEHEHEHEHE HEHEHEHEHE HEHEHEHEHE 


### relative_peaks_elqud. praat 
HHH 

### Timo Buchholz 

### timobuch@zedat.fu-berlin.de 
### April 2021 

HHH 

HHH 

### Praat 6.1.16 


FEHB EEE HE HH 
form Select directories (without final slash) 
comment Where are the WAV files kept? 
sentence wav_dir C:\User\Folder 
comment Where are the TextGrid files kept? 
sentence txt_dir C:\User\Folder 
comment Where should the results file be kept? 
sentence directory C:\User\Folder 
comment Where are the pitch files kept? 
sentence pitch_dir C:\User\Folder 


endform 


# Create TXT file with header 


filedelete 'directory$'/results.txt 
header row$ = “File” + tab$ + “Word.label” + tab$ + 
“number.all.acc.syls.utterance” + tab$ + 
. “start.time.utterance” + tab$ + “end.time.utterance” + tab$ + 
. “length.utterance” + tab$ + “acc.syl.label” + tab$ + “contrast.on.acc.syl. 
word” + tab$ + “number.this.acc.syl.utterance” + tab$ + “utterance.part” + tab$ + 
“wordclass” + tab$ + “start.acc.syl” + tab$ + 
. “end.acc.syl” + tab$ + “length.acc.syl” + tab$ + “meanf@.accsyl” + tab$ + 
“position.max.acc.syl” + tab$ + “position.min.acc.syl” + tab$ + 
. "maximum.acc.syl" + tab$ + “minimum.acc.syl” + tab$ + 


. “mean. first.pretonic. fifth” 
.. “mean.third.pretonic. fifth” 
. “mean. fifth.pretonic. fifth” 

*trans.maxfO.accsyl" + tab$ 


+ 


+ 
T 
+ 


tab$ + “mean.second. 
tab$ + “mean. fourth. 


pretonic. fifth” + tab$ + 
pretonic. fifth” + tab$ + 


tab$ + “trans.meanfQ.accsyl” + tab$ + 

“trans.minf@.accsyl” + tab$ + 

. “trans.first.pretonic.fifth” + tab$ + “trans.second.pretonic.fifth” + tab$ + 

.. *trans.third.pretonic.fifth" + tab$ + “trans.fourth.pretonic. fifth” + tab$ + 

. “trans.fifth.pretonic. fifth” + tab$ + 

. “start.time.word” + tab$ + “end.time.word” + tab$ + 
“length.word” + tab$ + “label.first.utt.extreme” + tab$ + 
“label.second.utt.extreme” + tab$ + “utterance.maximum” + tab$ + 

. “utterance.minimum” + tab$ + “pos.utterance.maximum” + tab$ + 

. “pos.utterance.minimum” + tab$ + "word.end.topicl" + tab$ + 

. *start.endsyl.topicl" + tab$ + "end.endsyl.topicl" + tab$ + 


[o] Open Access. © 2024 the author(s), published by De Gruyter. | CO EXE 
Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 
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. “meanf@.endsyl.topic1” + tab$ + “maxf@.endsyl.topic1” + tab$ + 

. *pos.max.endsyl.topicl" + tab$ + "minfO.endsyl.topicl" + tab$ + “pos.min. 
endsyl.topicl" + tab$ + “trans.meanfQ.endsyl.topicl” + tab$ + 

. *trans.maxfO.endsyl.topicl" + tab$ + "trans.minfO.endsyl.topicl" + tab$ + 
“word.end.comment1” + tab$ + 

. “start.endsyl.comment1” + tab$ + “end.endsyl.comment1” + tab$ + 

. “meanf®.endsyl.comment1” + tab$ + “maxf@.endsyl.comment1” + tab$ + 

. “pos.max.endsyl.comment1” + tab$ + “minf@.endsyl.comment1” + tab$ + “pos.min. 
endsyl.comment1” + tab$ + “trans.meanf®.endsyl.comment1” + tab$ + 

. “trans.maxf®.endsyl.comment1” + tab$ + “trans.minf@.endsyl.comment1” + tab$ + 
“word.end.topic2” + tab$ + 

. “start.endsyl.topic2” + tab$ + “end.endsyl.topic2” + tab$ + 

. *meanfO.endsyl.topic2" + tab$ + “maxf@.endsyl.topic2” + tab$ + 

. “pos.max.endsyl.topic2” + tab$ + “minf@.endsyl.topic2” + tab$ + “pos.min. 
endsyl.topic2" + tab$ + “trans.meanfQ.endsyl.topic2” + tab$ + 

. *trans.maxfO.endsyl.topic2" + tab$ + “trans.minf@.endsyl.topic2” + tab$ + 
“word.end.comment2” + tab$ + 

. “start.endsyl.comment2” + tab$ + “end.endsyl.comment2” + tab$ + 

. *meanfO.endsyl.comment2" + tab$ + “maxf@.endsyl.comment2” + tab$ + 

. “pos.max.endsyl.comment2” + tab$ + “minf@.endsyl.comment2” + tab$ + “pos.min. 
endsyl.comment2” + tab$ + “trans.meanf@.endsyl.comment2” + tab$ + 

. “trans.maxf®.endsyl.comment2” + tab$ + “trans.minf@.endsyl.comment2” + tab$ + 

. newline$ 
header. row$ > ‘directory$’/results.txt 


#Check whether there are pitch files already that have been used and corrected 
manually before WARNING: You must have one pitch file per each wav file for this 
to work. If you don't, the script will overwrite your existing pitch files 

# List all pitch files 


Create Strings as file list... list 'pitch dir$'/x.Pitch 
numberofPitchfiles - Get number of strings 


# List all WAV files 
Create Strings as file list... list ‘wav_dir$’/*.wav 
numberOfFiles - Get number of strings 


# Loop that goes through all files 
for ifile to numberOfFiles 


select Strings list 
fileName$ - Get string... ifile 


baseName$ = fileName$ - “.wav” 
# Read in the Sound files with that base name 
Read from file... 'wav. dir$' /' baseName$' . wav 


Read from file... 'txt dir$'/'baseName$' .TextGrid 
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## hash the if-clause optionally if you know all the files have pitch files 


anyway, and only keep the “read from file...” line 
# if 'numberofPitchfiles' = ‘numberOfFiles’ 

Read from file... ‘txt_dir$’/’baseName$’ .Pitch 
# else 


# # Create Pitch object 


# select Sound 'baseName$' 
# To Pitch (ac)... 0.005 75 15 no 0.03 0.6 0.01 0.35 0.30 600 
# endif 


# Select the individual words 
selectObject: “TextGrid 'baseName$'" 
plusObject: “Sound ‘baseName$’” 


Scale times 


select TextGrid ‘baseName$’ 


# Get start and end times for utterance (including pauses) 


nintervalbottomtier = Get number of intervals... 4 
startutterance = Get starting point... 4 2 
endutterance = Get end point... 4 'nintervalbottomtier'-1 


lengthutterance - (endutterance - startutterance)*1000 


#Get number of accented syllables in utterance 


naccsyltier - Get number of intervals... 2 
nemptyintervalaccsyltier = Count intervals where: 2, “is equal to", “” 
naccsyls = ‘naccsyltier’ - ‘nemptyintervalaccsyltier’ 


# look at pitch object and see if it has any obvious errors 
select Pitch ‘baseName$’ 
Shift times to: “start time", 0 


# hash out the following view & edit and pause lines if you already checked all 
the pitch files and just want to generate the results 


# View & Edit 


# pause Edit pitch object, then press “continue” to save and proceed to the 
next file 
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# determine positions for the final syllables in each of the four parts 


select TextGrid 'baseName$' 


toplendlabel$ = Get label of interval... 3 2 
comlendlabel$ = Get label of interval... 3 4 
top2endlabel$ = Get label of interval... 3 6 
com2endlabel$ - Get label of interval... 3 8 


starttoplend - Get starting point... 3 2 
startcomlend - Get starting point... 3 4 
starttop2end - Get starting point... 3 6 
startcom2end - Get starting point... 3 8 
endtoplend = Get end point... 3 2 
endcomlend - Get end point... 3 4 
endtop2end = Get end point... 3 6 
endcom2end - Get end point... 3 8 


# Get word label of final word at each part 


wordattoplend = Get interval at time... 1 (‘starttoplend’+0.0001) 
toplendwordlabel$ = Get label of interval... 1 ‘wordattoplend’ 


wordatcomlend = Get interval at time... 1 (‘startcomlend’+0. 0001) 
comlendwordlabel$ = Get label of interval... 1 'wordatcomlend' 


wordattop2end = Get interval at time... 1 (‘starttop2end’+0.0001) 


top2endwordlabel$ = Get label of interval... 1 ‘wordattop2end’ 
wordatcom2end = Get interval at time... 1 'startcom2end'-*0.0001 
com2endwordlabel$ = Get label of interval... 1 ‘wordatcom2end’ 


# determine positions and labels for the highest and lowest annotated points 


bottomtierlabel$ - Get label of interval... 42 


if bottomtierlabel$ <> “” 
firstextremelabel$ = Get label of interval... 4 2 


startfirstextreme = Get starting point... 4 2 
endfirstextreme = Get end point... 4 2 
secondextremelabel$ - Get label of interval... 4 4 
startsecondextreme - Get starting point... 4 4 
endsecondextreme - Get end point... 4 4 


endif 


if bottomtierlabel$ == “” 
firstextremelabel$ - Get label of interval... 4 3 
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startfirstextreme = Get starting point... 4 3 
endfirstextreme - Get end point... 4 3 
secondextremelabel$ - Get label of interval... 4 5 
startsecondextreme - Get starting point... 4 5 
endsecondextreme - Get end point... 4 5 


endif 
# get fO values for the endpoints of the parts and the extremes 


select Pitch 'baseName$' 


meanfirstextreme = Get mean... 'startfirstextreme' ‘endfirstextreme’ 
Hertz 

tMaxFOfirstextreme = Get time of maximum... ‘startfirstextreme’ 
*endfirstextreme' Hertz Parabolic 

tMinFOfirstextreme = Get time of minimum... ‘startfirstextreme’ 


*endfirstextreme' Hertz Parabolic 


fQmaxfirstextreme = Get value at time... 'tMaxFOfirstextreme' Hertz 
Linear 

fQminfirstextreme = Get value at time... 'tMinFOfirstextreme' Hertz 
Linear 

meansecondextreme = Get mean... ‘startsecondextreme’ ‘endsecondextreme’ 
Hertz 

tMaxFQsecondextreme = Get time of maximum... ‘startsecondextreme’ 
*endsecondextreme' Hertz Parabolic 

tMinFQsecondextreme = Get time of minimum... ‘startsecondextreme’ 


*endsecondextreme' Hertz Parabolic 


fOmaxsecondextreme = Get value at time... 'tMaxFOsecondextreme' Hertz 
Linear 
fOminsecondextreme = Get value at time... 'tMinFOsecondextreme' Hertz 
Linear 
if firstextremelabel$ == “H” and secondextremelabel$ == “L” 
utterancemax - 'fOmaxfirstextreme' 
utterancemin = ‘f@minsecondextreme’ 
posutterancemax = 'tMaxFOfirstextreme' 
posutterancemin = ‘tMinFQsecondextreme’ 
endif 
if firstextremelabel$ == “L” and secondextremelabel$ == “H” 
utterancemax = ‘f@maxsecondextreme’ 
utterancemin = 'fOminfirstextreme' 


posutterancemax = ‘tMaxFQsecondextreme’ 
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posutterancemin = 'tMinFOfirstextreme" 

endif 

meantoplend = Get mean... 'starttoplend' ‘endtoplend’ Hertz 

tMaxF@toplend = Get time of maximum... 'starttoplend' 'endtoplend' Hertz 
Parabolic 

tMinF@toplend = Get time of minimum... ‘starttoplend’ 'endtoplend' Hertz 
Parabolic 

fOmaxtoplend = Get value at time... 'tMaxFOtoplend' Hertz Linear 

fOmintoplend = Get value at time... 'tMinFOtoplend' Hertz Linear 

meancomlend = Get mean... ‘startcomlend’ ‘endcomlend’ Hertz 

tMaxF@comlend = Get time of maximum... ‘startcomlend’ 'endcomlend' Hertz 
Parabolic 

tMinF@comlend = Get time of minimum... 'startcomlend' ‘endcomlend’ Hertz 
Parabolic 

fQmaxcomlend = Get value at time... 'tMaxFOcomlend' Hertz Linear 

fOmincomlend = Get value at time... 'tMinFOcomlend' Hertz Linear 

meantop2end - Get mean... 'starttop2end' 'endtop2end' Hertz 

tMaxF@top2end = Get time of maximum... 'starttop2end' ‘endtop2end’ Hertz 
Parabolic 

tMinF@top2end = Get time of minimum... ‘starttop2end’ ‘endtop2end’ Hertz 
Parabolic 

fOmaxtop2end = Get value at time... 'tMaxFOtop2end' Hertz Linear 

fOmintop2end = Get value at time... ‘tMinFQtop2end’ Hertz Linear 

meancom2end = Get mean... ‘startcom2end’ ‘endcom2end’ Hertz 

tMaxF@com2end = Get time of maximum... 'startcom2end' ‘endcom2end’ Hertz 
Parabolic 

tMinF@com2end = Get time of minimum... 'startcom2end' ‘endcom2end’ Hertz 
Parabolic 

fümaxcom2end = Get value at time... 'tMaxFOcom2end' Hertz Linear 

fOmincom2end = Get value at time... 'tMinFOcom2end' Hertz Linear 


transformedmeantoplend = (‘meantoplend’-’utterancemin’ )/(‘utterancemax’- 
'utterancemin') 

transformedmeancomlend = ('*meancomlend'-'utterancemin')/(*'utterancemax'- 
'utterancemin') 

transformedmeantop2end = ('meantop2end'-'utterancemin')/(*'utterancemax'- 
'utterancemin') 

transformedmeancom2end = (‘meancom2end’-’utterancemin’ )/(‘utterancemax’- 
'utterancemin') 
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transformedmaxtoplend = (*fOmaxtoplend'-'utterancemin')/(*utterancemax'- 
'"utterancemin') 

transformedmaxcomlend = (*fOmaxcomlend'-'utterancemin')/(*utterancemax'- 
'"utterancemin') 

transformedmaxtop2end = (*fOmaxtop2end'-'utterancemin')/(*utterancemax'- 
'"utterancemin') 

transformedmaxcom2end = (*fOmaxcom2end'-'utterancemin')/(*utterancemax'- 
'"utterancemin') 

transformedmintoplend = (*fOmintoplend'-'utterancemin')/(*utterancemax'- 
'"utterancemin') 

transformedmincomlend = (*fOmincomlend'-'utterancemin')/(*utterancemax'- 
'"utterancemin') 

transformedmintop2end = (*fOmintop2end'-'utterancemin')/(*'utterancemax'- 
'"utterancemin') 

transformedmincom2end = (*fOmincom2end'-'utterancemin')/(*utterancemax'- 
'"utterancemin') 


# Go through each individual accented syllable 


for a from 1 to naccsyltier 
select TextGrid 'baseName$' 
syllabel$ = Get label of interval... 2 ‘a’ 


counter = 0 
endif 


if syllabel$ <> “” 
startsyl = Get starting point... 2 ‘a’ 
endsyl = Get end point... 2 ‘a’ 
lengthsyl - (endsyl - startsyl)*1000 


counter = ‘counter’+1 
# Get label of word from which the accented syllable comes 


wordatsyl = Get interval at time... 1 'startsyl'*0.0001 
wordlabel$ = Get label of interval... 1 ‘wordatsyl’ 


# Get label of which part and word class accented word belongs to 


if syllabel$ == “T1N1” or syllabel$ == “T1A1” or 
syllabel$ == “T1P1” 
sylpartlabel$ = “topic!” 
endif 


if syllabel$ == “C1Aux1” or syllabel$ == “C1V1” or 
syllabel$ == “C1N1” or syllabel$ == “C1A1” or syllabel$ == “C1P1” or syllabel$ == 
“CTArt1” or syllabel$ == “C1Aux2” or syllabel$ == “C1V2” or syllabel$ == “C1N2” or 
syllabel$ == “C1A2” or syllabel$ == “C1P2” or syllabel$ == “C1Art2” 
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sylpartlabel$ = “comment1” 
endif 


if syllabel$ == “T2N1” or syllabel$ == “T2A1” 
sylpartlabel$ = “topic2” 
endif 


if syllabel$ == “C2Aux1” or syllabel$ == “C2V1” or 
syllabel$ == “C2N1” or syllabel$ == “C2A1” or syllabel$ == “C2P1” or syllabel$ == 
“C2Art1” or syllabel$ == “C2Aux2” or syllabel$ == “C2V2” or syllabel$ == “C2N2” or 
syllabel$ == “C2A2” or syllabel$ == “C2P2” or syllabel$ == “C2Art2” 


sylpartlabel$ = “comment2” 
endif 


if syllabel$ == “TIN1” or syllabel$ == “T2N1” or 
syllabel$ == “C1N1” or syllabel$ == “C1N2” or syllabel$ == “C2N1” or syllabel$ == 
“C2N2” 
sylwordclasslabel$ = “N” 
endif 


if syllabel$ == “T1A1” or syllabel$ == “T2A1” or 
syllabel$ == “C1A1” or syllabel$ == “C1A2” or syllabel$ == “C2A1” or syllabel$ == 
“C2A2” 
sylwordclasslabel$ = “A” 
endif 


if syllabel$ == “T1V1” or syllabel$ == “T2V1” or 
syllabel$ == “C1V1” or syllabel$ == “C1V2” or syllabel$ == “C2V1” or syllabel$ == 
“C2V2? 
sylwordclasslabel$ = “V” 
endif 


if syllabel$ == “T1Aux1” or syllabel$ == “T2Aux1” 
or syllabel$ == "ClAux1" or syllabel$ == "ClAux2" or syllabel$ == “C2Aux1” or 
syllabel$ == “C2Aux2” 
sylwordclasslabel$ = “Aux” 
endif 


if syllabel$ == “T1P1” or syllabel$ == “T2P1” or 


syllabel$ == “C1P1” or syllabel$ == “C1P2” or syllabel$ == “C2P1” or syllabel$ == 
«C2p2" 
sylwordclasslabel$ = “P” 
endif 
if syllabel$ == “C1Art1” or syllabel$ == "ClArt2" or 


syllabel$ == “C2Art1” or syllabel$ == “C2Art2” 
sylwordclasslabel$ - "Art" 
endif 
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# Get duration of word 


startword = Get starting point... 1 ‘wordatsyl’ 
endword - Get end point... 1 'wordatsyl' 
lengthword - (endword - startword)*1000 


3t find out if the word is contrasted in the utterance or not 


contratsyl - Get interval at time... 5 
*startsyl'-*0.0001 
contrlabel$ - Get label of interval... 5 'contratsyl' 
if contrlabel$ <> “” 
contrast$ = “yes” 
else 
contrast$ = “no” 
endif 


# Find out what's before the accented syl to determine minimum points 


if 'startword' = 'startsyl' 
pretonic = Get interval at time... 1 
*startword' -0.0001 
pretoniclabel$ - Get label of interval... 1 
‘pretonic’ 


if pretoniclabel$ <> “” 
firstpretonicfifth = 'startsyl'-0.15 


secondpretonicfifth = 
*startsyl'-(0.15-0.15/5) 

thirdpretonicfifth - 
*startsyl'-(0.15-2x0.15/5) 

fourthpretonicfifth = 
*startsyl'-(0.15-3x0.15/5) 

fifthpretonicfifth - 


*startsyl'-(0.15-4x0.15/5) 
endif 


endif 


if 'startword' « 'startsyl' 


firstpretonicfifth - 'startword' 

secondpretonicfifth = 
*startword' *(*startsyl'-'startword')/5 

thirdpretonicfifth = 
*startword' *2x((*startsyl'-'startword')/5) 

fourthpretonicfifth = 
*startword' *3«(('startsyl'-'startword')/5) 

fifthpretonicfifth - 


*startword' *4x((*startsyl'-'startword')/5) 
endif 
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# move on to pitch object 


select Pitch 'baseName$' 


meansyl - Get mean... 'startsyl' 'endsyl' Hertz 

tMaxF@syl = Get time of maximum... ‘startsyl’ ‘endsyl’ 
Hertz Parabolic 

tMinF@syl = Get time of minimum... ‘startsyl’ ‘endsyl’ 
Hertz Parabolic 

fQmaxsyl = Get value at time... 'tMaxFOsyl' Hertz 
Linear 

fQminsyl = Get value at time... 'tMinFOsyl' Hertz 
Linear 


meanfirstpretonicfifth - Get mean... 
*firstpretonicfifth' 'secondpretonicfifth' Hertz 

meansecondpretonicfifth - Get mean... 
*secondpretonicfifth' ‘thirdpretonicfifth’ Hertz 

meanthirdpretonicfifth = Get mean... 
‘thirdpretonicfifth’ ‘fourthpretonicfifth’ Hertz 

meanfourthpretonicfifth = Get mean... 
‘fourthpretonicfifth’ ‘fifthpretonicfifth’ Hertz 

meanfifthpretonicfifth = Get mean... 
*fifthpretonicfifth' ‘startsyl’ Hertz 


transformedmeansyl = (‘meansyl’-’utterancemin’)/ 
(‘utterancemax’-’utterancemin’ ) 

transformedmaxsyl = (‘f@maxsyl’-’utterancemin’)/ 
(‘utterancemax’-’utterancemin’ ) 

transformedminsyl = (‘f@minsyl’-’utterancemin’)/ 
(‘utterancemax’-’utterancemin’ ) 

transformedmeanfirstpretonicfifth = 
(*meanfirstpretonicfifth'-'utterancemin')/(*utterancemax' -'utterancemin') 

transformedmeansecondpretonicfifth = 
(*meansecondpretonicfifth'-'utterancemin')/(*utterancemax' -'utterancemin') 

transformedmeanthirdpretonicfifth = 
(*meanthirdpretonicfifth'-'utterancemin')/('utterancemax'-'utterancemin') 

transformedmeanfourthpretonicfifth - 
(*meanfourthpretonicfifth'-'utterancemin')/(*utterancemax' -'utterancemin') 

transformedmeanfifthpretonicfifth = 
(*meanfifthpretonicfifth'-'utterancemin')/(*utterancemax' -'utterancemin') 


# save edited pitch object 


Save as text file: “’directory$’\’ baseName$’ .Pitch" 
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# Append values in the TXT file 
select TextGrid 'baseName$' 


fileappend “’directory$’/results. txt” 
‘baseName$’ ' tab$’ 'wordlabel$''tab$' 
.’naccsyls’’ tab$’ 
.'startutterance''tab$''endutterance''tab$' 
.'lengthutterance''tab$''syllabel$''tab$''contrast$' 
'tab$''counter''tab$''sylpartlabel$''tab$' 
'sylwordclasslabel$''tab$' 
.'startsyl''tab$''endsyl''tab$''lengthsyl''tab$' 
.'meansyl''tab$''tMaxFOsyl''tab$' 
.'tMinFOsyl''tab$''fOmaxsyl''tab$''fOminsyl''tab$' 
.'meanfirstpretonicfifth''tab$' 
'meansecondpretonicfifth''tab$' 
.’meanthirdpretonicfifth’ ’ tab$’ 
'meanfourthpretonicfifth''tab$' 
.’meanfifthpretonicfifth’ ’ tab 


'transformedmeansyl' ' tab$’ 
.'transformedmaxsyl' 'tab$''transformedminsyl''tab$' 
.'transformedmeanfirstpretonicfifth''tab$' 
'transformedmeansecondpretonicfifth'" tab$' 
.'transformedmeanthirdpretonicfifth'' tab$’ 
'transformedmeanfourthpretonicfifth'" tab$' 
.'transformedmeanfifthpretonicfifth''tab$' 
'startword' ' tab$’ 
.'endword' ' tab$' ' lengthword' ' tab$’ 
.'firstextremelabel$''tab$''secondextremelabel$' 
'tab$' 
.'utterancemax' 'tab$''utterancemin' ’ tab$’ 
..'posutterancemax'' tab$' 'posutterancemin' 'tab$" 
.’ toplendwordlabel$''tab$''starttoplend' 
'tab$''endtoplend''tab$' 
.'meantoplend'' tab$' ' f@maxtoplend’ ’ tab$’ 
' tMaxFOtoplend'' tab$’ 
.'fOmintoplend''tab$''tMinFOtoplend''tab$' 
.."transformedmeantoplend'' tab$’ 
' transformedmaxtoplend” "tab$" "transformedmintoplend? ' tab$” 
.’comlendwordlabel$’ ’ tab$’ ’startcomlend’’ tab$’ 
'endcomlend' ' tab$’ 
.’meancomlend’ ' tab$' ' f@maxcomlend’ ' tab$’ 
' tMaxFOcomlend' ' tab$’ 
.' femincomlend''tab$''tMinFOcomlend''tab$' 
..'transformedmeancomlend' ' tab$’ 
' transformedmaxcom1end” ' tab$? ' transformedmincomlend” “tabs? 
.’ top2endwordlabel$’ ’ tab$’ ’starttop2end’ ’ tab$’ 
?endtop2end’ ’ tab$’ 
.'meantop2end' ' tab$' ' f@maxtop2end’ ’ tab$’ 
' tMaxFOtop2end' ' tab$’ 
.'femintop2end''tab$''tMinFOtop2end' 'tab$' 
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...’ transformedmeantop2end' ' tab$’ 
'transformedmaxtop2end' ' tab$’ ' transformedmintop2end' 
'tab$' 

...'com2endwordlabel$''tab$''startcom2end' 'tab$' 
'endcom2end' ' tab$’ 

...'meancom2end' ' tab$’ ' f@maxcom2end’ ' tab$’ 
' tMaxF@com2end? ' tab$’ 

...’ fomincom2end' ' tab$' ' tMinF@com2end’ ' tab$" 

...’ transformedmeancom2end' ' tab$’ 
'transformedmaxcom2end' ' tab$’ ' transformedmincom2end’ 
'tab$' 

...'newline$' 


endif 
endfor 


# Clean up before going on with next file 
select Sound 'baseName$' 
plus TextGrid ‘baseName$’ 
plus Pitch ‘baseName$’ 
Remove 


endfor 


# Final clean up 
select Strings list 
Remove 

clearinfo 

print All files done! 
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HHHHUHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH 
### intonation, slopes.praat 
HHH 
### Timo Buchholz 
### timobuch@zedat.fu-berlin.de 
### March 2018 
HHH 
HHH 
### Praat 5.3.56 
HHHHUHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHUHHHHHHHHHHHHHHHHHHHH 
form Select directories (without final slash) 
comment Where are the WAV files kept? 
sentence wav. dir C:\User\Folder 
comment Where are the TextGrid files kept? 
sentence txt dir C:\User\Folder 
comment Where should the results file be kept? 
sentence directory C:\User\Folder 
comment Where are the pitch files kept? 
sentence pitch dir C:\User\Folder 
comment What is the name of the first (main) speaker? 
sentence speakerOne XU33 
comment Which is the tier with the syllables when the main speaker is 
speaking? 
integer speakersyltier 4 
comment Which is the tier with the word labels? 
integer wordtier 5 
comment Which language 
sentence language Quechua 
comment Which type of utterance 
sentence typeutterance pos. resp 
endform 


# Create TXT file with header 

filedelete 'directory$'/results.txt 

header. row$ = “Word.label” + tab$ + “File” + tab$ + “Speaker.name” + tab$ + 

. “language” + tab$ + “type.utterance” + tab$ + “number.words.utterance” + tab$ 


. “start.time.utterance” + tab$ + “end.time.utterance” + tab$ + 
. “length.utterance” + tab$ + “number.syls.word” + tab$ + “position” + tab$ + 
. “initial.slope” + tab$ + “final.slope” + tab$ + 
.. “start.time.word” + tab$ + “end.time.word” + tab$ + 
. “length.word” + tab$ + 
.. “initial.pitchrange” + tab$ + “final.pitchrange” + tab$ + 
. “initial.tone.distance” + tab$ + “final.tone.distance” + tab$ + “initial.Max” 
+ tab$ + “final.Max” + tab$ + 
. “initial.Min” + tab$ + “final.Min” + tab$ + “initial.pos.Max” + tab$ + “final. 
pos.Max” + tab$ + “initial.pos.Min” + tab$ + “final.pos.Min” + tab$ + 
. “firstsyluttmean” + tab$ + “lastsyluttmean” + tab$ + 
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. “meanF®.firstsyl” + tab$ + “meanF@.secondsyl” + tab$ + "meanF0.penult" + tab$ 
+ "meanFO.finalsyl" + tab$ + 

. *distance.init.max.from.end.of.firstsyl" + tab$ + “distance.init.min.from.end. 
of.firstsyl" + tab$ + “distance.final.max.from.end.of.penult” + tab$ + “distance. 
final.min.from.end.of.penult” + tab$ + 

. “start.firstsyl” + tab$ + “end.firstsyl” + tab$ + "end.secondsyl" + tab$ + 
“start.antepenult” + tab$ + “start.penult” + tab$ + “end.penult” + tab$ + “end. 
finalsyl" + tab$ + 

. “length.firsttwosyls” + tab$ + "length.lasttwosyls" + tab$ + 

. “initial.peak.pos” + tab$ + “initial.low.pos” + tab$ + “final.peak.pos” + tab$ 
+ “final.low.pos” + tab$ + “last.syl.in.word.voiceless” + tab$ + 

. newline$ 
header_row$ > 'directory$'/results.txt 


#Check whether there are pitch files already that have been used and corrected 
manually before WARNING: You must have one pitch file per each wav file for this 
to work. If you don't, the script will overwrite your existing pitch files 

# List all pitch files 


Create Strings as file list... list 'pitch dir$'/x.Pitch 
numberofPitchfiles - Get number of strings 


# List all WAV files 
Create Strings as file list... list ‘wav_dir$’/*.wav 
numberOfFiles - Get number of strings 


# Loop that goes through all files 


for ifile to numberOfFiles 
select Strings list 
fileName$ - Get string... ifile 
baseName$ = fileName$ - “.wav” 
# Read in the Sound files with that base name 
Read from file... ‘wav_dir$’/’baseName$’ .wav 
Read from file... ‘txt_dir$’/’baseName$’ .TextGrid 


if ‘numberofPitchfiles’ = ‘numberOfFiles’ 


Read from file... 'txt dir$'/'baseName$' .Pitch 
else 


# # Create Pitch object 
select Sound ‘baseName$’ 
To Pitch (ac)... 0.005 75 15 no 0.03 0.45 0.01 0.35 0.30 600 
endif 


# Select the individual words 


select TextGrid ‘baseName$’ 
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# Determine the number of non-empty intervals on the word tier, i.e. the number of 
words in the utterance 


nwordintervaltier = Get number of intervals... ‘wordtier’ 

nemptyintervalswordtier = Count intervals where: ‘wordtier’, “is equal 
to", «" 

nwords = ‘nwordintervaltier’ - ‘nemptyintervalswordtier’ 


# Get start and end times for utterance (including pauses) 


startutterance = Get starting point... ‘wordtier’ 2 
endutterance = Get end point... ‘wordtier’ 'nwordintervaltier'-1 
lengthutterance - (endutterance - startutterance)*1000 


startfile = Get starting point... ‘wordtier’ 1 

firstsyl = Get interval at time... ‘speakersyltier’ 
*startutterance'-*0.0001 

lastsyl = Get interval at time... 'speakersyltier' 'endutterance'-0.0001 

firstsyluttend - Get end point... 'speakersyltier' 'firstsyl' 

lastsylbegin - Get start point... 'speakersyltier' 'lastsyl' 


select Pitch 'baseName$' 
View & Edit 


pause Edit pitch object, then press "continue" to save and proceed to the 
next file 


select TextGrid 'baseName$' 
# Go through each individual word 


for a from 1 to nwordintervaltier 
select TextGrid 'baseName$' 
wordlabel$ = Get label of interval... ‘wordtier’ ‘a’ 


if wordlabel$ <> “” 
startword = Get starting point... ‘wordtier’ ‘a’ 
endword = Get end point... ‘wordtier’ ‘a’ 
lengthword = (endword - startword)*1000 


# Get number of syllables on the word 


sylintervalstartword - Get interval at time... 
*speakersyltier' ‘startword’+0.0001 

sylintervalendword - Get interval at time... 
*speakersyltier' ‘endword’ -0.0001 

numbersyls = ‘sylintervalendword’ - 
‘sylintervalstartword’ +1 
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*sylintervalstartword' 
*sylintervalstartword'-*1 
*sylintervalendword' -1 


*sylintervalendword' -1 


firstsylwordend = Get end point... ‘speakersyltier’ 
secondsylend = Get end point... ‘speakersyltier’ 
penultbegin = Get start point... ‘speakersyltier’ 
penultend = Get end point... ‘speakersyltier’ 


if 'sylintervalendword'-2 »- 1 
antepenultbegin - Get start point... 


*speakersyltier' 'sylintervalendword'-2 


endif 


lengthfirstsecondsyl - 'secondsylend'-'startword' 
lengthpenultfinalsyl = ‘endword’-’penultbegin’ 


# Find out whether the word is initial medial or final 


*startword' -0.0001 
*endword' *0.0001 
*leftwordinterval' 


*rightwordinterval' 


# Move on to pitch object 


leftwordinterval = Get interval at time... ‘wordtier’ 
rightwordinterval = Get interval at time... ‘wordtier’ 
leftwordlabel$ = Get label of interval... ‘wordtier’ 

rightwordlabel$ = Get label of interval... ‘wordtier’ 


if leftwordlabel$ <> “” and rightwordlabel$ <> “” 
wordposition$ = “medial” 
endif 


if leftwordlabel$ <> “” and rightwordlabel$ = “” 
wordposition$ = “final” 
endif 


if leftwordlabel$ = “” and rightwordlabel$ <> “” 
wordposition$ = “initial” 

endif 

if leftwordlabel$ = “” and rightwordlabel$ = “” 


wordposition$ = “standalone” 
endif 


select Pitch 'baseName$' 


meanF0, firstsyl = Get mean... (‘startutterance’- 


'startfile') ('firstsyluttend'-'startfile') Hertz 
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meanF@_lastsyl = Get mean... (‘lastsylbegin’- 
'startfile') ('endutterance'-'startfile') Hertz 


# test whether the whole word is voiceless and then proceed accordingly 


testmeanFO. word = Get mean... ('startword'-'startfile') 
(*endword'-'startfile') Hertz 
testmeanF0. wordstring$ = “’testmeanFQ_word’” 


#THIS IS THE !!!OPTIONAL!!! IF-CONDITION CHECKING WHETHER THE WORD HAS ANY VOICING 
AT ALL - IT'S NOT TABSPACED THE WAY IT SHOULD BE BECAUSE THAT WOULD HAVE MEANT 
TABSPACING EVERYTHING ELSE AGAIN BUT IT IS SURROUNDED BY THIS WARNING TO MAKE YOU 
AWARE 

# if testmeanF@_wordstring$ <> “--undefined--” 

#THIS IS THE !!!OPTONAL!!! IF-CONDITION CHECKING WHETHER THE WORD HAS ANY VOICING 
AT ALL - IT'S NOT TABSPACED THE WAY IT SHOULD BE BECAUSE THAT WOULD HAVE MEANT 
TABSPACING EVERYTHING ELSE AGAIN BUT IT IS SURROUNDED BY THIS WARNING TO MAKE YOU 
AWARE 


# test whether last syllable is voiceless and then proceed accordingly 


testlastsylF0 = Get mean... ‘penultend’-’startfile’ 
*endword'-'startfile' Hertz 

testlastsylFQstring$ = “’testlastsylFQ’” 

if testlastsylFQstring$ = “--undefined--” 
reducednumbersyls = 'numbersyls'-1 

else 
reducednumbersyls = ‘numbersyls’ 

endif 

if ‘reducednumbersyls’ = ‘numbersyls’ 


if ‘numbersyls’ < 2 


tMaxFQinit = Get time of maximum... 
*startword'-'startfile' ‘endword’-’startfile’ Hertz Parabolic 


tMinFQinit = Get time of minimum... 
*startword'-'startfile' ‘endword’-’startfile’ Hertz Parabolic 


fQmaxsylinit = Get value at time... 
‘tMaxF@init’ Hertz Linear 

fQminsylinit = Get value at time... 
*tMinFOinit' Hertz Linear 
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maxdistfromendoffirstsyl = 
*tMaxFOinit' + ‘startfile’ - ‘firstsylwordend’ 

mindistfromendoffirstsyl - 
*tMinFOinit' + ‘startfile’ - 
‘firstsylwordend’ 


maxdistfromendofpenult = 
‘maxdistfromendoffirstsyl’ 

mindistfromendofpenult = 
*mindistfromendoffirstsyl' 


initialpitchrange - 
*fümaxsylinit'-'fOminsylinit' 

initialtonedistance = (‘tMaxFQinit’ - 
*tMinFOinit')*1000 


initialslope - 'initialpitchrange' / 
‘initialtonedistance’ 

finalslope = ‘initialpitchrange’ / 
‘initialtonedistance’ 

finalpitchrange = ‘initialpitchrange’ 

finaltonedistance = 
‘initialtonedistance’ 

finalpeaksyl$ = “’NA’” 

finallowsyl$ = “’NA’” 

initialpeaksyl$ = “’NA’” 

initiallowsyl$ = “’NA’” 

tMaxFOinitreal = ‘tMaxFQinit’+’ 
startfile’ 

tMinFQinitreal = ‘tMinFQinit’+’ 
startfile’ 


tMaxF@finreal = ‘tMaxFQinitreal’ 
tMinFOfinreal = ‘tMinFQinitreal’ 


fQmaxsylfin = ‘f@maxsylinit’ 
fQminsylfin = 'fOminsylinit' 


meanFOfirstsyl = Get mean... 
*startword'-'startfile' ‘firstsylwordend’-’startfile’ Hertz 


meanFOfirstsylstring$ = 
*'meanFOfirstsyl'" 

meanFOsecondsylstring$ = “’NA’” 

meanF@penultstring$ = “’NA’” 

meanFOfinalsylstring$ = “NA?” 


*'initialslope'" 


*'initialpitchrange'" 
*' finalpitchrange'" 
*'initialtonedistance'" 
*' finaltonedistance'" 
*' fomaxsylinit'" 


*' füminsylinit'" 


*' foninsylinit'" 


*"tMaxFOinitreal'" 
*"tMinFOinitreal'" 
*"tMaxFOfinreal'" 


*"tMinFOfinreal'" 


*"maxdistfromendoffirstsyl'" 
*"mindistfromendoffirstsyl'" 
*"maxdistfromendofpenult'" 


*"mindistfromendofpenult'" 


Script used in section 6.1.6 — 557 


initialslopestring$ = 


finalslopestring$ = “’finalslope’” 
initialpitchrangestring$ = 


finalpitchrangestring$ = 
initialtonedistancestring$ = 


finaltonedistancestring$ = 


fomaxsylinitstring$ = 
fominsylinitstring$ = 
fømaxsylfinstring$ = “'f@maxsylfin'” 
femaxsylfinstring$ = “’fOmaxsylfin’” 
füminsylinitstring$ = 
fOminsylfinstring$ = “’fOminsylfin’” 


tMaxFOinitrealstring$ = 


tMinFOinitrealstring$ 


tMaxFOfinrealstring$ 


tMinFOfinrealstring$ 


secondsylendstring$ = “’NA’” 
antepenultbeginstring$ = “’NA’” 
penultbeginstring$ = “’NA’” 
penultendstring$ = “’NA’” 


maxdistfromendoffirstsylstring$ = 


mindistfromendoffirstsylstring$ = 


maxdistfromendofpenultstring$ 


mindistfromendofpenultstring$ 


endif 
if ‘numbersyls’ = 2 


tMaxFQinit = Get time of maximum... 


*startword'-'startfile' ‘secondsylend’-’startfile’ Hertz Parabolic 
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tMinFOinit = Get time of minimum... 
*startword'-'startfile' 'secondsylend'-'startfile' Hertz Parabolic 


fümaxsylinit = Get value at time... 
*tMaxFOinit' Hertz Linear 

füminsylinit = Get value at time... 
*tMinFOinit' Hertz Linear 


if ‘tMaxFQinit’ <= 
(‘secondsylend’-’startfile’) and ‘tMaxFQinit’ > ('firstsylwordend'-'startfile') 
initialpeaksyl$ = 
“second” 
endif 


if ‘tMaxFQinit’ <= (‘first- 
sylwordend'-'startfile') and ‘tMaxFQinit’ >= (‘startword’-’startfile’) 
initialpeaksyl$ = 
“first” 
endif 


if ‘tMinFQinit’ <= (‘sec- 
ondsylend'-'startfile') and ‘tMinFQinit’ > (‘firstsylwordend’-’startfile’) 


initiallowsyl$ = 
“second” 
endif 
if ‘tMinFQinit’ <= (‘first- 
sylwordend'-'startfile') and ‘tMinFQinit’ >= (‘startword’ - ‘startfile’) 
initiallowsyl$ = 
“first” 
endif 
maxdistfromendoffirstsyl = 
*tMaxFOinit' + ‘startfile’ - ‘firstsylwordend’ 
mindistfromendoffirstsyl = 
*tMinFOinit' + ‘startfile’ - ‘firstsylwordend’ 
initialpitchrange = 


*fümaxsylinit'-'fOminsylinit' 
initialtonedistance = (‘tMaxFQinit’ - 
*tMinFOinit')*1000 


finalpeaksyl$ - "'initialpeaksyl$'" 
finallowsyl$ - "'initiallowsyl$'" 


initialslope - 'initialpitchrange' / 
‘initialtonedistance’ 


tMaxFO0fin = Get time of maximum... 
*penultbegin'-'startfile' 'endword'-'startfile' Hertz Parabolic 
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tMinFØfin = Get time of minimum... 


*penultbegin'-'startfile' ‘endword’-’startfile’ Hertz Parabolic 


*tMaxFOfin' Hertz Linear 


*tMinFOfin' Hertz Linear 


*fümaxsylfin'-'fOminsylfin' 


*tMinFOfin')x1000 


*finaltonedistance" 


'startfile' 
'startfile' 
'startfile' 


'startfile' 


fQmaxsylfin = Get value at time... 


fQminsylfin = Get value at time... 


finalpitchrange - 


finaltonedistance = (‘tMaxFQ@fin’ - 


finalslope = ‘finalpitchrange’ / 


tMaxFQinitreal = ‘tMaxFQinit’+ 
tMinFQinitreal = ‘tMinFQinit’+ 
tMaxFOfinreal = 'tMaxFOfin'-* 


tMinFOfinreal = 'tMinFOfin'-* 


meanFOfirstsyl = Get mean... 


*startword'-'startfile' 'firstsylwordend'-'startfile' Hertz 


*penultend'-'startfile' ‘endword’-’startfile’ 


*maxdistfromendoffirstsyl' 


*mindistfromendoffirstsyl' 


*"meanFOfirstsyl'" 


*"meanFOfinalsyl'" 


*'initialslope'" 


*'initialpitchrange'" 
*" finalpitchrange'" 


*'initialtonedistance'" 


meanFOfinalsyl = Get mean... 
Hertz 


maxdistfromendofpenult 


mindistfromendofpenult 


meanFOfirstsylstring$ = 
meanFOsecondsylstring$ = “’NA’” 
meanFOpenultstring$ = “’NA’” 
meanF@finalsylstring$ = 
initialslopestring$ = 


finalslopestring$ = “’finalslope’” 
initialpitchrangestring$ = 


finalpitchrangestring$ = 


initialtonedistancestring$ = 
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finaltonedistancestring$ = 


*'finaltonedistance'" 

fOmaxsylinitstring$ = 
*' fümaxsylinit'" 

fOminsylinitstring$ = 
*'füminsylinit'" 

fOmaxsylfinstring$ = '*'fOmaxsylfin'" 

fOmaxsylfinstring$ = *'fOmaxsylfin'" 

fOminsylinitstring$ = 
*'füminsylinit'" 

fOminsylfinstring$ = “’fminsylfin’” 

tMaxFOinitrealstring$ = 
*'tMaxFOinitreal'" 

tMinFOinitrealstring$ = 
*'tMinFOinitreal'" 

tMaxFOfinrealstring$ = 
*'tMaxFOfinreal'" 

tMinFOfinrealstring$ = 
*'tMinFOfinreal'" 

secondsylendstring$ = 
*' secondsylend'" 

antepenultbeginstring$ = “’NA’” 

penultbeginstring$ = “’NA’” 

penultendstring$ = “’NA’” 


maxdistfromendoffirstsylstring$ - 
*'maxdistfromendoffirstsyl'" 

mindistfromendoffirstsylstring$ - 
*'mindistfromendoffirstsyl'" 


maxdistfromendofpenultstring$ 
*'maxdistfromendofpenult'" 


mindistfromendofpenultstring$ 
*'mindistfromendofpenult'" 


endif 
if ‘numbersyls’ > 2 


tMaxFOinit - Get time of maximum... 
*startword'-'startfile' 'secondsylend'-'startfile' Hertz Parabolic 


tMinFOinit - Get time of minimum... 
*startword'-'startfile' 'secondsylend'-'startfile' Hertz Parabolic 


fQmaxsylinit = Get value at time... 
*tMaxFOinit' Hertz Linear 

füminsylinit = Get value at time... 
*tMinFOinit' Hertz Linear 
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if ‘tMaxFQinit’ <= 


(*secondsylend'-'startfile') and ‘tMaxF@init’ > ('firstsylwordend'-'startfile') 


*second" 


initialpeaksyl$ = 
endif 


if ‘tMaxFQinit’ <= 


(‘firstsylwordend’-’startfile’) and ‘tMaxFQ@init’ >= (‘startword’-’startfile’) 


“first” 


initialpeaksyl$ 
endif 


if ‘tMinFQinit’ <= 


(*secondsylend'-'startfile') and ‘tMinFQinit’ > (‘firstsylwordend’-’startfile’) 


“second” 


initiallowsyl$ = 
endif 


if ‘tMinFQinit’ <= 


(‘firstsylwordend’-’startfile’) and ‘tMinFQ@init’ >= (‘startword’-’startfile’) 


initiallowsyl$ = 
“first” 
endif 

maxdistfromendoffirstsyl = 
‘tMaxF@init’ + ‘startfile’ - ‘firstsylwordend’ 

mindistfromendoffirstsyl = 
*tMinFOinit' + ‘startfile’ - ‘firstsylwordend’ 

initialpitchrange = 'fOmaxsylinit'- 
'füminsylinit' 


*tMinFOinit')*1000 


‘initialtonedistance’ 


‘penultbegin’-’startfile’ 


‘penultbegin’-’startfile’ 


‘tMaxF@fin’ Hertz Linear 


*tMinFOfin' Hertz Linear 


initialtonedistance = ('tMaxFOinit' - 


initialslope = ‘initialpitchrange’ / 


tMaxFOfin = Get time of maximum... 


*endword'-'startfile' Hertz Parabolic 


tMinFOfin = Get time of minimum... 
*endword'-'startfile' Hertz Parabolic 


fQmaxsylfin = Get value at time... 


fQminsylfin = Get value at time... 


if ‘tMaxFQfin’ <= 


(*endword'-'startfile') and ‘tMaxF@fin’ > ('penultend'-'startfile') 
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finalpeaksyl$ = 
“final” 
endif 


if “tMaxF@fin' <= 
(‘penultend’-’startfile’) and “tMaxF@fin' >= ('penultbegin'-'startfile') 
finalpeaksyl$ 


“penult” 
endif 


if 'tMinFOfin' <= 
(*endword'-'startfile') and 'tMinFOfin' > (‘penultend’-’startfile’) 


finallowsyl$ - 
*final" 
endif 
if 'tMinFOfin' <= 
(‘penultend’-’startfile’) and 'tMinFOfin' >= (‘penultbegin’ - ‘startfile’) 
finallowsyl$ = 
“penult” 
endif 
maxdistfromendofpenult = ‘tMaxFQfin’ 
+ ‘startfile’ - ‘penultend’ 
mindistfromendofpenult = ‘tMinFQfin’ 
+ ‘startfile’ - ‘penultend’ 
finalpitchrange = 'fOmaxsylfin'- 
'füminsylfin' 


finaltonedistance = (‘tMaxF@fin’ - 
*tMinFOfin')*1000 


finalslope - 'finalpitchrange' / 


*finaltonedistance' 

tMaxFOinitreal = 
*tMaxFOinit'-*'startfile' 

tMinFOinitreal = 
*tMinFOinit'-*'startfile' 

tMaxF@finreal = 
“tMaxF@fin’+’startfile’ 

tMinFOfinreal = 


*tMinFOfin'-*'startfile' 


meanFOfirstsyl = Get mean... 
*startword'-'startfile' 'firstsylwordend'-'startfile' Hertz 

meanFOsecondsyl - Get mean... 
*firstsylwordend'-'startfile' 'secondsylend'-'startfile' Hertz 

meanF@penult = Get mean... 
*penultbegin'-'startfile' 'penultend'-'startfile' Hertz 


*penultend'-'startfile' ‘endword’-’startfile’ 


*"meanFOfirstsyl'" 
*"meanFOsecondsyl1'" 
*"meanFOpenult'" 


*"meanFOfinalsyl'" 


*'initialslope'" 


*"initialpitchrange'" 
*' finalpitchrange'" 
*'initialtonedistance'" 
*' finaltonedistance'" 
*' fomaxsylinit'" 


*" füminsylinit'" 


*" foninsylinit'" 


*"tMaxFOinitreal'" 
*"tMinFOinitreal'" 
*"tMaxFOfinreal'" 


*"tMinFOfinreal'" 


“* secondsylend'" 


“* antepenultbegin’” 


*"maxdistfromendoffirstsyl'" 
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meanFOfinalsyl = Get mean... 
Hertz 


meanF@firstsylstring$ = 
meanFOsecondsylstring$ = 


meanFOpenultstring$ = 


meanF@finalsylstring$ = 


initialslopestring$ = 


finalslopestring$ = “’finalslope’” 
initialpitchrangestring$ = 


finalpitchrangestring$ = 
initialtonedistancestring$ = 


finaltonedistancestring$ = 


fümaxsylinitstring$ = 
füminsylinitstring$ = 
fOmaxsylfinstring$ = “’fOmaxsylfin’” 
fOmaxsylfinstring$ = “’fOmaxsylfin’” 
fOminsylinitstring$ = 
fOminsylfinstring$ = '"'fOminsylfin'" 


tMaxFOinitrealstring$ = 
tMinFOinitrealstring$ = 
tMaxFOfinrealstring$ = 


tMinFOfinrealstring$ = 


secondsylendstring$ = 
antepenultbeginstring$ = 


penultbeginstring$ = “’penultbegin’” 
penultendstring$ = “’penultend’” 


maxdistfromendoffirstsylstring$ = 
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mindistfromendoffirstsylstring$ = 
*'mindistfromendoffirstsyl'" 

maxdistfromendofpenultstring$ = 
*'maxdistfromendofpenult'" 


mindistfromendofpenultstring$ 
*'mindistfromendofpenult'" 


endif 

lastsylvoiceless$ = “NO” 
endif 
if ‘reducednumbersyls’ = 'numbersyls'-1 


if 'reducednumbersyls' « 2 


tMaxFO0init = Get time of maximum... 


*startword'-'startfile' 'firstsylwordend'-'startfile' Hertz Parabolic 


tMinFQinit = Get time of minimum... 


*startword'-'startfile' 'firstsylwordend'-'startfile' Hertz Parabolic 


tMaxFOinitstring$ = “’tMaxFQinit’” 
tMinFOinitstring$ = “’tMinFQinit’” 


if tMaxFQinitstring$ <> 


*--undefined--" 

fümaxsylinit = Get value 
time... 'tMaxFOinit' Hertz Linear 

fQminsylinit = Get value 
time... 'tMinFOinit' Hertz Linear 

maxdistfromendoffirstsyl 
*tMaxFOinit' + ‘startfile’ - ‘firstsylwordend’ 

mindistfromendoffirstsyl 
*tMinFOinit' + 'startfile' - ‘firstsylwordend’ 


maxdistfromendofpenult = 
*maxdistfromendoffirstsyl' 

mindistfromendofpenult = 
*mindistfromendoffirstsyl' 


initialpitchrange - 
*fümaxsylinit'-'fOminsylinit' 

initialtonedistance - 
('tMaxFOinit' - 'tMinFOinit')*1000 

initialslope - 


‘initialpitchrange’ / ‘initialtonedistance’ 


at 


at 
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‘initialpitchrange’ / ‘initialtonedistance’ 


*initialpitchrange" 


‘initialtonedistance’ 


*tMaxFOinit'-*'startfile' 


*tMinFOinit'-*'startfile' 


*tMaxFOinitreal' 


*tMinFOinitreal' 


*startword'-'startfile' 


*"meanFOfirstsyl'" 


NA?” 


NA?” 


*'initialslope'" 

*' finalslope'" 
*'initialpitchrange'" 
*" finalpitchrange'" 
*'initialtonedistance'" 
*' finaltonedistance'" 
*' fomaxsylinit'" 

*" fominsylinit'" 


*' fomaxsylfin'" 


*firstsylwordend'-'startfile' 


finalslope - 


finalpitchrange - 


finaltonedistance - 


tMaxFQinitreal 


tMinFQinitreal = 


tMaxFOfinreal = 


tMinFOfinreal = 


fQmaxsylfin = 
fQminsylfin = 


*fümaxsylinit' 
*füminsylinit' 


meanFOfirstsyl = Get mean... 
Hertz 


meanFOfirstsylstring$ = 
meanFOsecondsylstring$ = 


meanFOpenultstring$ = “’NA’” 
meanF@finalsylstring$ = 


initialslopestring$ = 
finalslopestring$ = 
initialpitchrangestring$ = 
finalpitchrangestring$ = 
initialtonedistancestring$ = 


finaltonedistancestring$ = 


fOmaxsylinitstring$ 
fOminsylinitstring$ = 


fOmaxsylfinstring$ = 
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fOmaxsylfinstring$ = 
*' fümaxsylfin'" 

fOminsylinitstring$ = 
*'füminsylinit'" 

fOminsylfinstring$ = 
*' füminsylfin'" 

tMaxFOinitrealstring$ = 
*'tMaxFOinitreal'" 

tMinFOinitrealstring$ = 
*'tMinFOinitreal'" 

tMaxFOfinrealstring$ = 
“? tMaxF@finreal’” 

tMinFQfinrealstring$ = 
*'tMinFefinreal'" 
maxdistfromendoffirstsylstring$ = “’maxdistfromendoffirstsyl’” 
mindistfromendoffirstsylstring$ = “’mindistfromendoffirstsyl’” 
maxdistfromendofpenultstring$ = “’maxdistfromendofpenult’” 
mindistfromendofpenultstring$ = “’mindistfromendofpenult’” 

else 

initialslopestring$ = “’NA’” 

finalslopestring$ = “’NA’” 

initialpitchrangestring$ = 
NA” 

finalpitchrangestring$ = 
NA?” 

initialtonedistancestring$ = 
NA?” 

finaltonedistancestring$ = 
NA?” 

fømaxsylinitstring$ = “’NA’” 

føminsylinitstring$ = “’NA’” 

fOmaxsylfinstring$ = “’NA’” 

fOmaxsylfinstring$ = “’NA’” 

fOminsylinitstring$ = “NA?” 

fOminsylfinstring$ = “’NA’” 

tMaxFOinitrealstring$ = 
NA” 

tMinFOinitrealstring$ = 
NA” 

tMaxF@finrealstring$ = 
NA” 

tMinFOfinrealstring$ = 
NA” 


meanFOfirstsylstring$ = 


“NA?” 
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meanFOsecondsylstring$ = 


NA?” 
meanFOpenultstring$ = “’NA’” 
meanF@finalsylstring$ = 

NA?” 

maxdistfromendoffirstsylstring$ = “’NA’” 

mindistfromendoffirstsylstring$ = “’NA’” 

maxdistfromendofpenultstring$ = “’NA’” 

mindistfromendofpenultstring$ = “’NA’” 

endif 
finalpeaksyl$ = “’NA’” 
finallowsyl$ = “’NA’” 
initialpeaksyl$ = “’NA’” 
initiallowsyl$ = “’NA’” 
secondsylendstring$ = “’NA’” 
antepenultbeginstring$ = “’NA’” 
penultbeginstring$ = “’NA’” 
penultendstring$ = “’NA’” 
endif 
if ‘reducednumbersyls’ = 2 


tMaxFQinit = Get time of maximum... 
*startword'-'startfile' ‘secondsylend’-’startfile’ Hertz Parabolic 


tMinFOinit = Get time of minimum... 
*startword'-'startfile' ‘secondsylend’-’startfile’ Hertz Parabolic 


fQmaxsylinit = Get value at time... 
‘tMaxF@init’ Hertz Linear 

fQminsylinit = Get value at time... 
*tMinFOinit' Hertz Linear 


if ‘tMaxFQ@init’ <= 
(*secondsylend'-'startfile') and 'tMaxFOinit' > (‘firstsylwordend’-’startfile’) 
initialpeaksyl$ 


*second" 
endif 


if ‘tMaxFQ@init’ <= 
(‘firstsylwordend’-’startfile’) and ‘tMaxFQ@init’ >= (‘startword’-’startfile’) 
initialpeaksyl$ 


I 


“first” 
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endif 


if ‘tMinFQinit’ <= 
(‘secondsylend’-’startfile’) and ‘tMinFQinit’ > (‘firstsylwordend’-’startfile’) 


initiallowsyl$ = 
“second” 
endif 
if ‘tMinFQinit’ <= 
(‘firstsylwordend’-’startfile’) and 'tMinFOinit' >= (‘startword’ - ‘startfile’) 
initiallowsyl$ = 
“first 
endif 
finalpeaksyl$ = “’initialpeaksyl$’” 
finallowsyl$ = “’initiallowsyl$’” 
maxdistfromendoffirstsyl - 
*tMaxFOinit' + ‘startfile’ - ‘firstsylwordend’ 
mindistfromendoffirstsyl - 
*tMinFOinit' + 'startfile' - ‘firstsylwordend’ 
initialpitchrange - 'fOmaxsylinit'- 
'füminsylinit' 


initialtonedistance = (‘tMaxFQinit’ - 
*tMinFOinit')*1000 


initialslope - 'initialpitchrange' / 
‘initialtonedistance’ 


tMaxF@fin = ‘tMaxFQinit’ 
tMinFOfin = ‘tMinFQinit’ 


fQmaxsylfin = Get value at time... 


*tMaxFOfin' Hertz Linear 


füminsylfin = Get value at time... 


*tMinFOfin' Hertz Linear 


finalpitchrange = 'fOmaxsylfin'- 
'füminsylfin' 

finaltonedistance = ('tMaxFOfin' - 
*tMinFOfin')*1000 


finalslope - 'finalpitchrange' / 
‘finaltonedistance’ 


tMaxFOinitreal = 
*tMaxFOinit'-*'startfile' 
tMinFOinitreal 


*tMinFOinit'-*'startfile' 


*tMaxFOfin'-*'startfile' 


*tMinFOfin'-*'startfile' 


*maxdistfromendoffirstsyl' 


*mindistfromendoffirstsyl' 
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tMaxFOfinreal = 


tMinFOfinreal = 


maxdistfromendofpenult 


mindistfromendofpenult 


meanFOfirstsyl = Get mean... 


*startword'-'startfile' ‘firstsylwordend’-’startfile’ Hertz 


*firstsylwordend'-'startfile' 


*"meanFOfirstsyl'" 


*"meanFOfinalsyl'" 


*'initialslope'" 


*'initialpitchrange'" 
*" finalpitchrange'" 
*'initialtonedistance'" 
*' finaltonedistance'" 
*' fomaxsylinit'" 


*" füminsylinit'" 


*" füminsylinit'" 


*"tMaxFOinitreal'" 
*"tMinFOinitreal'" 
*"tMaxFOfinreal'" 


*"tMinFOfinreal'" 


meanFOfinalsyl = Get mean... 


*secondsylend'-'startfile' Hertz 


meanF@firstsylstring$ = 


meanFOsecondsylstring$ = “NA?” 
meanF@penultstring$ = “’NA’” 
meanF@finalsylstring$ = 


initialslopestring$ = 


finalslopestring$ = “’finalslope’” 
initialpitchrangestring$ = 


finalpitchrangestring$ = 
initialtonedistancestring$ = 


finaltonedistancestring$ = 


fümaxsylinitstring$ = 
fOminsylinitstring$ = 
fOmaxsylfinstring$ = “’fOmaxsylfin’” 
femaxsylfinstring$ = “’fOmaxsylfin’” 
füminsylinitstring$ = 
feminsylfinstring$ = “’fOminsylfin’” 


tMaxFOinitrealstring$ = 
tMinFOinitrealstring$ = 


tMaxFOfinrealstring$ = 


tMinFOfinrealstring$ 
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secondsylendstring$ = 

*' secondsylend'" 
antepenultbeginstring$ = “’NA’” 
penultbeginstring$ = “’NA’” 
penultendstring$ = “’NA’” 


maxdistfromendoffirstsylstring$ = 
*'maxdistfromendoffirstsyl'" 

mindistfromendoffirstsylstring$ - 
*'mindistfromendoffirstsyl'" 

maxdistfromendofpenultstring$ = 
*'maxdistfromendofpenult'" 


mindistfromendofpenultstring$ 
*'mindistfromendofpenult'" 


endif 
if 'reducednumbersyls' > 2 


tMaxFOinit - Get time of maximum... 
*startword'-'startfile' 'secondsylend'-'startfile' Hertz Parabolic 


tMinFOinit - Get time of minimum... 
*startword'-'startfile' 'secondsylend'-'startfile' Hertz Parabolic 


fQmaxsylinit = Get value at time... 
*tMaxFOinit' Hertz Linear 

füminsylinit = Get value at time... 
*tMinFOinit' Hertz Linear 


if ‘tMaxFQinit’ <= 
(*secondsylend'-'startfile') and ‘tMaxFQinit’ > ('firstsylwordend'-'startfile') 
initialpeaksyl$ - 
*second" 
endif 


if ‘tMaxFQinit’ <= 
(‘firstsylwordend’-’startfile’) and 'tMaxFOinit' >= (‘startword’-’startfile’) 
initialpeaksyl$ = 
“first” 
endif 


if ‘tMinFQinit’ <= 
(*secondsylend'-'startfile') and ‘tMinFQinit’ > ('firstsylwordend'-'startfile') 
initiallowsyl$ - 
*second" 
endif 


if ‘tMinFQinit’ <= 
(‘firstsylwordend’-’startfile’) and 'tMinFOinit' >= (‘startword’ - ‘startfile’) 
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initiallowsyl$ = 
“first” 
endif 

maxdistfromendoffirstsyl = 
‘tMaxFðinit’? + 'startfile' - ‘firstsylwordend’ 

mindistfromendoffirstsyl = 
*tMinFOinit' + ‘startfile’ - ‘firstsylwordend’ 

initialpitchrange = 'fOmaxsylinit'- 
'feminsylinit' 


initialtonedistance = ('tMaxFOinit' - 
*tMinFOinit')*1000 


initialslope - 'initialpitchrange' / 
‘initialtonedistance’ 


tMaxFOfin = Get time of maximum... 
*antepenultbegin'-'startfile' 'penultend'-'startfile' Hertz Parabolic 


tMinFOfin = Get time of minimum... 
*antepenultbegin'-'startfile' 'penultend'-'startfile' Hertz Parabolic 


fQmaxsylfin = Get value at time... 
‘tMaxF@fin’ Hertz Linear 

fQminsylfin = Get value at time... 
*tMinFOfin' Hertz Linear 


if ‘tMaxFQfin’ <= 
(*penultend'-'startfile') and ‘tMaxF@fin’ > ('penultbegin'-'startfile') 
finalpeaksyl$ 


I 


“penult” 
endif 


if ‘tMaxFQ@fin’ <= 
(*penultbegin'-'startfile') and ‘tMaxF@fin’ >= (‘antepenultbegin’-’startfile’) 


finalpeaksyl$ = 
“antepenult” 
endif 
if ‘tMinFOfin’ <= 
(*penultend'-'startfile') and ‘tMinF@fin’ > (‘penultbegin’ - ‘startfile’) 
finallowsyl$ = 
“penult” 


endif 


if ‘tMinFOfin’ <= 
(*penultbegin'-'startfile') and 'tMinFOfin' >= (‘antepenultbegin’-’startfile’) 
finallowsyl$ - 
*antepenult" 
endif 
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finalpitchrange = 'fOmaxsylfin'- 
'füminsylfin' 

finaltonedistance = ('tMaxFOfin' - 
*tMinFOfin')*1000 


maxdistfromendofpenult = ‘tMaxFQfin’ 


+ ‘startfile’ - ‘penultbegin’ 

mindistfromendofpenult = ‘tMinFQfin’ 
+ ‘startfile’ - ‘penultbegin’ 

finalslope = ‘finalpitchrange’ / 
‘finaltonedistance’ 

tMaxFOinitreal = 
*tMaxFOinit'-*'startfile' 

tMinFOinitreal - 
*tMinFOinit'-*'startfile' 

tMaxF@finreal = 
*tMaxFOfin'-*'startfile' 

tMinFOfinreal = 


*tMinFOfin'-*'startfile' 


meanFOfirstsyl = Get mean... 
*startword'-'startfile' 'firstsylwordend'-'startfile' Hertz 
meanFOsecondsyl = Get mean... 
*firstsylwordend'-'startfile' 'secondsylend'-'startfile' Hertz 
meanF@penult = Get mean... 
*antepenultbegin'-'startfile' 'penultbegin'-'startfile' Hertz 
meanFOfinalsyl = Get mean... 
*penultbegin'-'startfile' 'penultend'-'startfile' Hertz 


meanFOfirstsylstring$ = 


*'meanFOfirstsyl'" 

meanFOsecondsylstring$ = 
*'meanFOsecondsyl'" 

meanFOpenultstring$ - 
*'meanFOpenult'" 

meanFOfinalsylstring$ = 
*'meanFOfinalsyl'" 

initialslopestring$ - 
*'initialslope'" 

finalslopestring$ = “’finalslope’” 

initialpitchrangestring$ - 
*'initialpitchrange'" 

finalpitchrangestring$ - 
*' finalpitchrange'" 

initialtonedistancestring$ - 
*'initialtonedistance'" 


finaltonedistancestring$ = 
*' finaltonedistance'" 


*' fomaxsylinit'" 


*' füminsylinit'" 


*" füminsylinit'" 


*"tMaxF0Oinitreal'" 
*"tMinFOinitreal'" 
*"tMaxFOfinreal'" 


*"tMinFOfinreal'" 


*' secondsylend'" 


“* antepenultbegin’” 


*"maxdistfromendoffirst 
*"mindistfromendoffirst 
*"maxdistfromendofpenul 


*"mindistfromendofpenul 


#THIS IS THE !!!OPTIONA| 


syl' » 
syl' » 
t” 


t” 


endif 


L!!! IF-CON 
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fOmaxsylinitstring$ = 
fominsylinitstring$ = 
fOmaxsylfinstring$ = ''fOmaxsylfin'" 
fOmaxsylfinstring$ = “’fOmaxsylfin’” 
füminsylinitstring$ = 
fOminsylfinstring$ = “’fminsylfin’” 


tMaxFOinitrealstring$ = 


tMinFOinitrealstring$ = 


I 


tMaxFOfinrealstring$ 


tMinFOfinrealstring$ 


secondsylendstring$ - 
antepenultbeginstring$ - 


penultbeginstring$ = “’penultbegin’” 
penultendstring$ = “’penultend’” 


maxdistfromendoffirstsylstring$ 
mindistfromendoffirstsylstring$ = 


maxdistfromendofpenultstring$ = 


mindistfromendofpenultstring$ 


endif 


lastsylvoiceless$ - "YES" 


DITION CHECKING WHETHER THE WORD HAS ANY VOICING 


AT ALL - IT'S NOT TABSPACED THE WAY IT SHOULD BE BECAUSE THAT WOULD HAVE MEANT 


TABSPACING EVERYTHING E 
AWARE 

# 

#THIS IS THE !!!OPTIONA| 


LSE AGAIN B 


endif 
L!!! IF-CON 


UT IT IS SURROUNDED BY THIS WARNING TO MAKE YOU 


DITION CHECKING WHETHER THE WORD HAS ANY VOICING 


AT ALL - IT'S NOT TABSPACED THE WAY IT SHOULD BE BECAUSE THAT WOULD HAVE MEANT 


TABSPACING EVERYTHING E 
AWARE 


LSE AGAIN B 


UT IT IS SURROUNDED BY THIS WARNING TO MAKE YOU 
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# print “initialslope” ‘initialslope’, 
“finalslope” ‘finalslope’, “initialpitchrange” ‘initialpitchrange’, “initialtone- 
distance” ‘initialtonedistance’, “finalpitchrange” ‘finalpitchrange’, “finaltone- 


distance” ‘finaltonedistance’ 
# save edited pitch object 

Save as text file: “’directory$’\’ baseName$’ .Pitch” 
# Append values in the TXT file 

select TextGrid ‘baseName$’ 


fileappend “’directory$’/results.txt” ‘wordlabel$’’ tab$ 
? ?” baseName$’ ' tab$’ ' speakerOne$’ ’ tab$’ 
.’ language$’ ’ tab$’ ’ typeutter- 
ance$’ ’ tab$’ ’nwords’ ’ tab$’ 

.'startutterance''tab$''endutterance' ’ tab$’ 

.’ lengthutterance’ ’ tab$’ ’numbersyls’ ’ tab$’ 

.'wordposition$''tab$''initialslopestring$''tab$' 

..'finalslopestring$''tab$''startword''tab$' 
.'endword' ' tab$'' lengthword' ' tab$’ 
.'initialpitchrangestring$''tab$''finalpitchrange- 

string$''tab$' 

.'initialtonedistancestring$''tab$' 
'finaltonedistancestring$’ "tabs? ' fømaxsylinitstrings” ' tab$” 

.’ fOmaxsylfinstring$’ ’ tab$’’ f@minsylinit- 

string$''tab$' 

.'feminsylfinstring$''tab$''tMaxFOinitreal- 

string$''tab$' 

.'tMaxFOfinrealstring$''tab$''tMinFOinitreal- 

string$''tab$' 

.."tMinFOfinrealstring$''tab$' 
.’meanF0_firstsyl’’ tab$’ ’meanFQ_lastsyl’’ tab$’ 
.’meanFOfirstsylstring$’ ’ tab$’ ’meanF@secondsyl- 

string$’’ tab$’ 

.'meanFOpenultstring$' ’ tab$’ ’meanF@finalsyl- 

string$''tab$' 

.’maxdistfromendoffirstsylstring$’ ’ tab$’ 
'mindistfromendoffirstsylstring$’ ’ tab$’ 'maxdistfromendofpenultstring$' 'tab$" 
'mindistfromendofpenultstring$''tab$' 

.'startword''tab$''firstsylwordend''tab$' 
'secondsylendstring$''tab$''antepenultbeginstring$''tab$' penultbeginstring$' 
'tab$''penultendstring$''tab$''endword''tab$' 

.'lengthfirstsecondsyl''tab$''lengthpenultfinal- 

syl''tab$''initialpeaksyl$''tab$' 

.'initiallowsyl$''tab$''finalpeaksyl$''tab$' 
'finallowsyl$''tab$''lastsylvoiceless$''tab$' 

.'newline$' 


endif 
endfor 


# Clean up before going on with next file 
select Sound 'baseName$' 
plus TextGrid 'baseName$' 
plus Pitch ‘baseName$’ 
Remove 


endfor 


# Final clean up 
select Strings list 
Remove 

clearinfo 

print All files done! 


Script used in section 6.1.6 — 575 
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